Project 7: Handwriting Recognition with Decision Trees/Random Forests
Overview
You will implement the 
decision tree and random forest 
machine learning algorithms, and apply them to the task of
Handwriting Recognition.
Files
The csci335 repository contains a 
learning.decisiontree package that we will be using in this project. 
Files you modify are marked with an asterisk (*).
It contains the following files:
  - DecisionTreeinterface: Represents a decision tree node. Decision 
trees are defined recursively with these two implementations of the interface:- 
      - DTLeafclass: Represents a decision tree leaf.
- DTInteriorclass*: Represents an interior node:- 
          - classify(): Recursively classifies based on the feature value. If the 
targeted feature value is less than or equal to the maxFeatureValue, ask 
the left subtree. Otherwise, ask the right subtree.- DTTest.testInterior()should pass when this works.
 
 
- DTTrainerclass*: Performs the decision tree learning algorithm:- 
      - getGini(): Calculates gini coefficient of a data set.
  Should pass- DTTest.testGini().
- gain(): Calculates gain of two children given a parent.
  Should pass- DTTest.testGain().
- ‘splitOn()’: Returns a duple of two new lists of training data. The first returned list should be 
  everything from this set for which featurehas a value less than or equal tofeatureValue. 
  The second returned list should be everything else from this list. Should passDTTest.testSplit().
- reducedFeatures(): Call- allFeatures.apply()to get the feature list. Then shuffle the list, 
retaining only- targetNumberfeatures. Should pass- DTTest.testReduced().
- train(): Implements the decision tree learning algorithm. Should pass- DTTest.testTrain().
- resample(): Generates a new data set by sampling randomly with replacement. It should return 
an- ArrayListthat is the same length as- data, where each element is selected randomly
from- data. Should pass- DTTest.testResample().
 
- RandomForestclass*:- 
      - classify(): Ask each tree root for its classification of the Drawing. Pick the plurality winner 
as the winner. I recommend using a- Histogram.
 
Also examine the following files that show adaptation of decision trees and random forests to our target domain:
  - learning.handwriting.learners:- 
      - DrawingTree
- DrawingForest30
 
  - Perform 4-way cross-validation to assess your decision tree on each of your handwriting data sets. How well does it perform in comparison with kNN and SOM?
- For the random forests, experiment with at least three different numbers of trees in the forest. Create a new class 
for each forest size.
- Feel free to perform additional experiments to clarify any issues that may arise.
Visualization
A visualization has been provided for you for the handwriting problem, for both the regular Decision Tree and 
the Random Forest. Be sure to employ the visualizations in your analysis.
Paper
When you are finished with your experiments, write a paper summarizing your findings.  Include the following:
  - An analysis and discussion of your data.  (Be sure to include the data as well.)
- An analysis and discussion of the visualizations of the learned functions.
You are strongly encouraged to include images of the visualizations in support of your analysis.
- How well does decision tree learning perform for each task?
- Are random forests worth the trouble?  Why or why not?
- How do these algorithms compare with k-nearest-neighbor, Naive Bayes, and self-organizing maps?
Assessment
  - Level 1: Decision tree learning implemented and works. Paper includes analysis of decision tree learning for 
the handwriting domain.
- Level 2: Random forest learning implemented and works. Paper includes analysis of decision tree 
as well as random forest learning for the handwriting domain. The analysis includes a 
comparison of their relative benefits and drawbacks, as seen in your implementation
experience and in your data.
- Level 3: Both random forests and decision tree learning are applied to a real-world data set you 
have obtained beyond that employed for Levels 1 and 2.
    
      - You may use the same real-world data set as you may have used for Level 3 of the previous assignment.
- Your paper should include an analysis of the performance of both algorithms on that data set.
- If you reused the data set from the previous assignment, you should also analyze their performance relative
to the algorithms from the previous assignment.