Project Report Due: 9:00am Monday May 12
Project Presentations: Week of May 12
The goals of this project are to work with your team:
To answer a specific question assigned to your group using the on time flight data from the Bureau of Transportation Statistics. The question should be answered two ways: (a) by using all of the data to come up with a population-based answer and (b) by using an appropriate sampling technique to come up with a sample-based answer (with appropriate quantification(s) of uncertainty).
To produce a short report summarizing your findings.
To make a presentation giving more details about your findings to the entire class.
Data for this project come from the Bureau of Transportation Statistics Airline On Time Statistics.
The data are hosted in the PostgreSQL database, see the Getting the Data section below. You may also find the following helpful:
By Wednesday, April 30 at 9:00am: Be able to discuss some ideas about population-based approaches and sampling-based approaches with the entire class.
By Monday, May 12 at 9:00am: Submit a two-page (reasonable margins and font size) document that summarizes the following:
During the week of May 12 Make a 20 minute presentation of your project. The order in which teams make presentations will be determined randomly during the week of May 5. Each presentation must consist of four sections: (1) Overview and Question of Interest; (2) Population-based Findings; (3) Sample-based Findings; (4) Discussion, Obstacles and Solutions. These sections may not necessarily be of equal length, but the total presentation length must not be longer than 20 minutes. All members of your team must be prepared to deliver all four sections—assignments of which team member will present which section will be determined randomly immediately before the presentation begins.
Presentation slides must be made available in PDF format so that they can be posted on the course website—these should be sent to Charlotte or Alix before 9:00am on the day your group presents.
By Friday, May 16 at 9:50am: You must provide Charlotte and Alix with access to your GIT repository where we will be able to access your well-documented R-code with which we could completely reproduce the content of your summary report and your presentation.
By Friday, May 16 at 9:50am: Each group member must turn in a completed Group Member Evaluation Form for all other members on his/her team.
Install postgreSQL from: http://www.enterprisedb.com/products-services-training/pgdownload
Install the R package RPostgreSQL
Then run the following code (you shouldn’t get any errors):
library(dplyr)
endpoint <- "flights.cwick.co.nz"
user <- "student"
password <- "password"
ontime <- src_postgres("ontime",
host = endpoint,
port = 5432,
user = user,
password = password)
flights <- tbl(ontime, "flights")
as.tbl(head(flights))
You can then treat flights
like a data.frame and use the dplyr verbs. More details in class.