Project 5: Distance-based learning

Overview

You will implement the k-nearest-neighbor and self-organizing map machine learning algorithms, and apply them to the task of Handwriting Recognition.

Files

The csci335 repository contains a learning package that we will be using in this project. It contains many files. The files in which you will write code are:

  • classifiers.Knn: Implementation of the k-nearest-neighbor algorithm. It should pass the unit tests in classifiers.KnnTest when complete.
  • classifiers.SOMRecognizer: Supervised learning algorithm using a SOM.
  • som.SelfOrgMap: Implementation of the self-organizing map. It should pass the unit tests in som.SelfOrgMapTest when complete.
  • handwriting.learners: Configured handwriting learners go here. A few examples are provided.
  • handwriting.core.FloatDrawing: Alternative handwriting representation for SOM.

Other files of particular interest include:

  • handwriting.gui.DrawingEditor: Run this to create handwriting samples and test your learners.
  • core.Histogram: Implements a Histogram data type. This will be useful for kNN among other things.
  • core.Classifier: Interface for our machine learning algorithms.
  • handwriting.core.Drawing: Data type for handwriting samples. Drawing::distance is used by kNN.
  • handwriting.core.SOMDrawingBridge: Classifier front-end that turns Drawing objects into FloatDrawing objects to make life easier for the self-organizing map.

Experiments

  • Complete the implementations of the above files as specified.
  • Using DrawingEditor, draw 20 samples each of two letters. For each drawing, click the “Record” button when it is complete. For the label, use the letter that you drew. Once this is complete, save the file (using the Save command on the File menu).
  • Test the performance of Knn3 on these two letters. Use the Assess menu option under the Learner menu, and perform 4-way cross-validation.
  • Expand your data set to train it to distinguish three letters. Save the expanded data set under a different filename.
  • Continue iterating this process until you can build a classifier that can distinguish at least eight different letters.
  • Compare the performance of k=3 against two other values of k.
  • Repeat this process with the self-organizing map. Compare performance with three different map sizes.

Paper

When you are finished with your experiments, write a paper summarizing your findings. Include the following:

  • An analysis and discussion of your data. (Be sure to include the data as well.)
  • What effect did variations of the value of k have?
  • How about variations in map size?
  • What insights did you gain from the SOM visualizations?
  • Compare the performance of kNN and the SOM for the handwriting classification tasks. Which worked better? Why?
  • Beyond the actual results, what other issues are noteworthy?

Submission

  • Post your code on Github in your private repository. Make sure the instructor is added as a collaborator.
  • Upload your paper to Microsoft Teams.

Grading Criteria

  • Level 1
    • The kNN algorithm is implemented and functional.
    • The paper includes an analysis of kNN for the handwriting problem.
  • Level 2
    • The self-organizing map algorithm is implemented and functional.
    • The paper includes all of the above analysis.
  • Level 3
    • Find a real-world data set that you would like to explore. Do the following:
      • Perform unsupervised learning on the data set with the self-organizing map and produce a visualization.
        • What insight do you get about the data from this process? Analyze in your paper.
      • Perform supervised learning on the data set comparing kNN and the SOM.
        • Which produced the best performance? Why?