Project Report Due: 9:00am Monday May 12
Project Presentations: Week of May 12

Overall Project Goals

The goals of this project are to work with your team:

  1. To answer a specific question assigned to your group using the on time flight data from the Bureau of Transportation Statistics. The question should be answered two ways: (a) by using all of the data to come up with a population-based answer and (b) by using an appropriate sampling technique to come up with a sample-based answer (with appropriate quantification(s) of uncertainty).

  2. To produce a short report summarizing your findings.

  3. To make a presentation giving more details about your findings to the entire class.

Project Data

Data for this project come from the Bureau of Transportation Statistics Airline On Time Statistics.

The data are hosted in the PostgreSQL database, see the Getting the Data section below. You may also find the following helpful:

Project Deliverables

  1. By Wednesday, April 30 at 9:00am: Be able to discuss some ideas about population-based approaches and sampling-based approaches with the entire class.

  2. By Monday, May 12 at 9:00am: Submit a two-page (reasonable margins and font size) document that summarizes the following:

    • Project/data background and the question that you addressed.
    • Your findings from the population-based approach, including a discussion of all the assumptions you made.
    • Your findings from the sample-based approach, including a discussion of all the assumptions you made.
    • A comparison of the two approaches, as well as obstacles you encountered while implementing one or both approaches, and useful/interesting solutions to overcoming those obstacles.
  3. During the week of May 12 Make a 20 minute presentation of your project. The order in which teams make presentations will be determined randomly during the week of May 5. Each presentation must consist of four sections: (1) Overview and Question of Interest; (2) Population-based Findings; (3) Sample-based Findings; (4) Discussion, Obstacles and Solutions. These sections may not necessarily be of equal length, but the total presentation length must not be longer than 20 minutes. All members of your team must be prepared to deliver all four sections—assignments of which team member will present which section will be determined randomly immediately before the presentation begins.

    Presentation slides must be made available in PDF format so that they can be posted on the course website—these should be sent to Charlotte or Alix before 9:00am on the day your group presents.

  4. By Friday, May 16 at 9:50am: You must provide Charlotte and Alix with access to your GIT repository where we will be able to access your well-documented R-code with which we could completely reproduce the content of your summary report and your presentation.

  5. By Friday, May 16 at 9:50am: Each group member must turn in a completed Group Member Evaluation Form for all other members on his/her team.

Getting the data

  1. Install postgreSQL from: http://www.enterprisedb.com/products-services-training/pgdownload

  2. Install the R package RPostgreSQL

  3. Then run the following code (you shouldn’t get any errors):

library(dplyr)

endpoint <- "flights.cwick.co.nz"
user <- "student"
password <- "password"

ontime <- src_postgres("ontime", 
  host = endpoint,
  port = 5432,
  user = user,
  password = password)

flights <- tbl(ontime, "flights")
as.tbl(head(flights))

You can then treat flights like a data.frame and use the dplyr verbs. More details in class.