In this lab, you will find a dataset and analyze it using the pandas and plotnine python libraries.
Find a CSV-formatted scientific dataset with multiple columns and rows of experimental results. Your dataset must have more than one numerical feature and at least one categorical feature. Here are some resources for open data:
Open a notebook and begin writing your data dictionary.
Load your csv file as a Pandas data frame. Begin exploring your data using the methods discussed in class and used in Lab #1. Perform any data transformations that you feel adds value to your analysis.
Continue exploring your data using visualization techniques discussed in class. Specifically,
Be sure to label your axes and title your figures.
Throughout your notebook, write descriptions of your findings and graphs. Draw some conclusions. Include a summary section near the end about what you learned from exploring this data.
As you work on this lab, record all of your progress in a Jupyter notebook. Record your solutions and attempts in Code blocks, and annotate what you did with MarkDown blocks. Cite all the webpages you find and use in your search for your solution. You should turn in this notebook, all of the data you used, and anything else produced during your investigation. A good solution should read like a self-contained report.
Code
MarkDown
The FEV notebook linked above is a good example of my expectations for this lab, in terms of the computational analysis and the written descriptions and reflections.
Complete: Notebook can be read easily w/o needing to reference this page for additional detail. It should read like a self-contained report. It can be executed without producing runtime errors. All steps (1, 2, 3, and 4) are finished. All data loaded into the notebook should be provided.
Partially Complete: Notebook can be executed without producing runtime errors. All steps (1, 2, 3, and 4) are attempted.