In this exam, you will demonstrate your mastery of CSCI 285 concepts in three parts. Part #1 focuses on analyzing a dataset with pandas. Part #2 focuses on visualizing it. Part #3 asks you to take the dataset and perform KMeans clustering and PCA.
The iris flower dataset is a multivariate data set introduced by the British statistician and biologist Ronald Fisher in his 1936 paper “The use of multiple measurements in taxonomic problems”. It is sometimes called Anderson’s Iris data set because Edgar Anderson collected the data to quantify the morphologic variation of Iris flowers of three related species. The data set consists of 50 samples from each of three species of Iris (Iris Setosa, Iris virginica, and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters.
To earn a Partially Complete on the exam, you must
Load iris.csv into a pandas data frame. Record the shape of the data frame.
Display the first 10 rows of the dataset.
Display the data types of the columns.
To earn a Complete on the exam, you must also
Display the count of each value in the species column.
species
Determine the number of irises where sepal width is greater than 3 cm.
Next, use plotnine to visualize the dataset.
Continuing clustering with the iris flower data set.
A Jupyter notebook that begins with the following statement,
All of the below work is my own. I adhered to the test-taking procedure by not receiving any help from my peers or generative AI. I have cited all resources I found online or from notebooks shared from class that helped me complete this exam.