Project 2: Vacuum Cleaner

Overview

By rewarding a robot when it vacuums dirt, and penalizing the robot when it hits an obstacle, we will attempt to use Q-Learning to train a robot to vacuum up the dirt from a floor.

Files

The csci335 repository contains a robosim package that we will be using in this project. Files you modify are marked with an asterisk (*). It contains the following packages and files:

robosim.core: Contains numerous files. Those of particular interest are:
- Simulator:
- Action:
- SimObject:
- Obstacle:
- Dirt:
- Robot:
- Polar:
- Direction:
- Controller:
robosim.ai*: Package for robot controllers. This is where your Q-Learners will go. Currently contains:
- RandomController: Moves randomly.
- ActionHuman: Does nothing. Lets human pilot the robot.
- BasicAvoider: Drives forward while avoiding obstacles.
- DirtChaser: Tries to vacuum dirt while avoiding obstacles.
robosim.reinforcement: Q-Learning files
- QTable*: You will complete the following methods in this file:
  - getLearningRate()
  - getBestAction()
  - isExploring()
  - leastVisitedAction()
  - senseActLearn()
- QTableTest: Unit tests for QTable

Level 1: Implementing Q-Learning

Create an implementation of the Q-Learning algorithm by completing the five methods listed above in the QTable class. All unit tests in QTableTest must pass in order to earn credit.

Level 2: Obstacle Avoidance

Create a Q-Learner that automatically learns the following task:

Drive forward as much as possible
Avoid hitting objects

To create this Q-Learner, you will need to do the following:

Determine how the robot’s sensory information will be transformed into an integer set of states.
Devise a reward scheme to incentivize the desired behavior.
Select an initial combination of the following:
- Discount rate
- Learning rate decay constant
- Target number of visits to control exploration
- Number of time steps to run the simulator
Create a class in the robosim.ai package that implements the Controller interface.
Each object of this class should have a QTable instance variable.
In your constructor, initialize the QTable with the appropriate values as determined earlier.

Next, create and save three different maps to evaluate the robot’s performance. These maps should include obstacles but need not include any dirt.

Having created the maps, perform the following experiments on each of the three maps:

Run your initial Q-Learner for the specified number of time steps.
Record its performance in terms of steps moving forward and number of collisions.
Run RandomController and BasicAvoider for the same number of time steps, and record their performance.
Identify one or more parameters to vary. Run your Q-Learner with the new parameters, and record its performance.
Repeat a third time with another variation of parameters.

In total, you will run five experiments per map (one RandomController, one BasicAvoider, and three q-learning experiments with different parameters) for a total of 15 experiments.

Level 3: Housekeeping

Create a new Q-Learner that automatically learns the following task:

Collect as much dirt as possible
Avoid hitting objects

Repeat the methodology and experiments from Level 2, with the following variations:

Create three new maps that have dirt. You may add dirt to the existing maps or create new maps that contain dirt.
Instead of comparing to BasicAvoider, compare to DirtChaser. Still compare as well to RandomController.
You will run an additional 15 experiments in total.

Paper

When you are finished with your experiments, write a paper summarizing your findings. Include the following:

Descriptions of all experiments and parameters.
- Be sure to include the exact parameter values used in each q-learning experiment.
- Also be sure to include the exact number of time steps for which the simulation ran.
- Also be sure to describe exactly how your reward function works, how your states are encoded, and which actions the robot has available.
Images of all of your maps.
Tables presenting all data collected.
Assess the impact of each of the four key experimental parameters and each of the maps.
Assess the degree to which the learned behavior represents an improvement on random action selection.
Also discuss how well the learned behavior performs in comparison with the hand-crafted controller.