Project Report Due: 9:00am Monday June 2
Project Presentations: Week of June 2

Overall Project Goals

The goals of this project are to work with your team:

  1. To work as a group to apply an appropriate machine learning technique to a dataset from the . Machine learning methods fall, broadly, into three categories: classification, clustering and prediction. A dataset and one of these broad categories will be assigned separately to each group. These datasets do not necessarily fall into the category of “Big Data,” but machine learning techniques are generally scalable to any size of data.

  2. To produce a short report describing the method you used (with appropriate references) and summarizing your findings.

  3. To make a presentation giving more details about the method you used (with appropriate references) and your findings to the entire class.

Project Resources

Data for this project come from the UCI Machine Learning Repository ( Each group will be assigned one dataset and task from

CRAN task views list R packages relevant to a specific task, you might find these two helpful for finding methods implemented in R:

This site gives a pretty good list of methods too:

Project Deliverables

  1. By Monday, June 2 at 9:00am: Submit a two-page (reasonable margins and font size) document that summarizes the following:

    • Project/data background, the category of machine learning method (classification, clustering or prediction) you used and how it applies to (i.e., is relevant for) your data.
    • A description of the machine learning method(s) you used, with appropriate references to additional resources
    • Your findings from applying the machine learning method(s) to your data
    • A discussion about assumptions/limitations of the approach(es) you used, and information about how the method(s) could be scaled up to even bigger datasets.
  2. By the week of June 2: Make a 20 minute presentation of your project. The order in which teams make presentations will be determined randomly during the week of Jun 2. Each presentation must consist of four sections: (1) Introduction and Overview; (2) Detailed description of the machine learning method(s) you used; (3) Summary of findings from applying you method(s) to your data; (4) Discussion including assumptions/limitations of the method(s) and scalability. These sections will not necessarily be of equal length, but the total presentation length must not be longer than 20 minutes. All members of your team must be prepared to deliver all four sections—assignments of which team member will present which section will be determined randomly immediately before the presentation begins.

    Presentation slides must be made available in PDF format so that they can be posted on the course website—these should be sent to Charlotte or Alix before 9:00am on the day your group presents.

  3. By Friday, June 6 at 9:50am: You must provide Charlotte and Alix with access to your GIT repository where we will be able to access your well-documented R-code with which we could completely reproduce the content of your summary report and your presentation.

  4. By Friday, June 6 at 9:50am: Each group member must turn in a completed Group Member Evaluation Form for all other members on his/her team.