Haskell’s focus on functions and function composition, lazy evaluation, and support for wholemeal programming actually make it ideal for building data analysis pipelines.
For project 2, you will create a program which reads and analyzes a data set, displaying some summary information and allowing a user to query the data set.
You should choose a data set to use for your analysis. An ideal data set has the following characteristics:
.csv
.json
Here is a collection of many nice data sets that you are welcome to choose from, or you may select your own.
Your program should read and parse the raw data into some appropriate data structure, i.e. something that represents the structure and meaning of the data, not just a list of lists of Strings or something similar. I recommend creating an algebraic data type (using record syntax) to represent one row of the data set; then read in a list of rows.
data
Optionally, you may also wish to pre-process the data set into something more structured than a list of rows: for example, a Map that allows quick lookup of rows by primary key.
Map
For Level 2, your program must then prompt the user and allow them to choose among analyses or queries they would like (e.g. show them the average score among all rows, or the median of all departure times, or the sum of the salaries from a certain state, …)
A recursive IO action is the way to write a recurring menu! A very simple version might look like this:
IO
menu :: IO () menu = do putStr "Your choice? " choice <- getLine case choice of "A" -> doThingA "B" -> doThingB _ -> pure () when (choice /= "quit") menu
I have provided a sample project to help get you started and give you some ideas. Note that it is only a bare minimum Level 1 project. If you’d like to take a look at it, you should download it and unzip it somewhere.
The sample project contains:
README.md
flights-demo.cabal
flights.csv
app-simple/Main.hs
lines
splitAt
app-cassava/Main.hs
cassava
You are welcome to use the sample project as a starting point for your project, or you can create a new project from scratch.
You should turn in a .zip or .tgz file containing an entire Cabal package, along with the data file(s) needed by your project.
.zip
.tgz
author
maintainer
.cabal
LICENSE
map
filter
Data.Map.fromListWith
(sum(X) + sum(Y)) / 2