Project 5: Distance-based learning
Overview
You will implement the k-nearest-neighbor and
self-organizing map
machine learning algorithms, and apply them to the task of
Handwriting Recognition.
Files
The csci335 repository contains a
learning
package that we will be using in this project. It contains many
files. The files in which you will write code are:
classifiers.Knn
: Implementation of the k-nearest-neighbor algorithm. It
should pass the unit tests in classifiers.KnnTest
when complete.
classifiers.SOMRecognizer
: Supervised learning algorithm using a SOM.
som.SelfOrgMap
: Implementation of the self-organizing map. It should pass
the unit tests in som.SelfOrgMapTest
when complete.
handwriting.learners
: Configured handwriting learners go here. A few
examples are provided.
handwriting.core.FloatDrawing
: Alternative handwriting representation
for SOM.
Other files of particular interest include:
handwriting.gui.DrawingEditor
: Run this to create handwriting samples
and test your learners.
core.Histogram
: Implements a Histogram data type. This will be useful for
kNN among other things.
core.Classifier
: Interface for our machine learning algorithms.
handwriting.core.Drawing
: Data type for handwriting samples. Drawing::distance
is used by kNN.
handwriting.core.SOMDrawingBridge
: Classifier
front-end that turns
Drawing
objects into FloatDrawing
objects to make life easier for the
self-organizing map.
Experiments
- Complete the implementations of the above files as specified.
- Using
DrawingEditor
, draw 20 samples each of two letters. For each drawing, click the “Record” button when it is complete. For the label, use the letter that you drew. Once this is complete, save the file (using the Save command on the File menu).
- Test the performance of
Knn3
on these two letters. Use the Assess
menu
option under the Learner
menu, and perform 4-way cross-validation.
- Expand your data set to train it to distinguish three letters. Save the
expanded data set under a different filename.
- Continue iterating this process until you can build a classifier that can distinguish at least eight different letters.
- Compare the performance of k=3 against two other values of k.
- Repeat this process with the self-organizing map. Compare performance with
three different map sizes.
Paper
When you are finished with your experiments, write a paper summarizing your findings. Include the following:
- An analysis and discussion of your data. (Be sure to include the data as well.)
- What effect did variations of the value of k have?
- How about variations in map size?
- What insights did you gain from the SOM visualizations?
- Compare the performance of kNN and the SOM for the handwriting classification tasks. Which worked better? Why?
- Beyond the actual results, what other issues are noteworthy?
Submission
- Post your code on Github in your private repository. Make sure the instructor is added as a collaborator.
- Upload your paper to Microsoft Teams.
Grading Criteria
- Level 1
- The
kNN
algorithm is implemented and functional.
- The paper includes an analysis of
kNN
for the handwriting problem.
- Level 2
- The self-organizing map algorithm is implemented and functional.
- The paper includes all of the above analysis.
- Level 3
- Find a real-world data set that you would like to explore. Do the following:
- Perform unsupervised learning on the data set with the self-organizing map and produce a visualization.
- What insight do you get about the data from this process? Analyze in your paper.
- Perform supervised learning on the data set comparing kNN and the SOM.
- Which produced the best performance? Why?