Lab 2: Data Visualization
Overview
In this lab, you will find a dataset and analyze it using the pandas and seaborn python libraries.
Materials
Step 1: Data Gathering
Find a CSV-formatted scientific dataset with multiple columns and rows of experimental results. Your dataset must have more than one numerical feature and at least one categorical feature. Here are some resources for open data:
Open a notebook and begin writing your data dictionary.
- Record the provenance of the data.
- How it was collected and by whom.
- Describe the features (e.g. columns) of your data
Step 2: Analysis
Load your csv file as a Pandas data frame. Begin exploring your data using the methods discussed in class and used in Lab #1. Perform any data transformations that you feel adds value to your analysis.
Step 3: Visualization
Continue exploring your data using visualization techniques discussed in class. Specifically,
- Draw histograms
- Draw scatter plots (with and without linear reg)
- Draw categorical plots (box, violin, swarm, etc.)
- Divide the data into subcategories for further plotting
Be sure to label your axes and title your figures.
Step 4: Conclusions
Throughout your notebook, write descriptions of your findings and graphs. Draw some conclusions. Include a summary section near the end about what you learned from exploring this data.
What To Turn In
As you work on this lab, record all of your progress in a Jupyter notebook. Record your solutions and attempts in Code
blocks, and annotate what you did with MarkDown
blocks. Cite all the webpages you find and use in your search for your solution. You should turn in this notebook, all of the data you used, and anything else produced during your investigation. A good solution should read like a self-contained report.
Grading
The FEV notebook linked above is a good example of my expectations for this lab, in terms of the computational analysis and the written descriptions and reflections.
-
Complete: Notebook can be read easily w/o needing to reference this page for additional detail. It should read like a self-contained report. It can be executed without producing runtime errors. All steps (1, 2, 3, and 4) are finished. All data loaded into the notebook should be provided.
-
Partially Complete: Notebook can be executed without producing runtime errors. All steps (1, 2, 3, and 4) are attempted.