Oregon State University
Spring 2014
Lectures: MWF 900-950 OWEN 103
Instructors:
Charlotte Wickham, 76 Kidder charlotte.wickham@stat.oregonstate.edu
Alix Gitelman, 48 Kidder gitelman@stat.oregonstate.edu
Office hours:
Wickham: 1-2pm WF in 76 Kidder
Gitelman: 2-3pm M in 48 Kidder
Reading:
dplyr vignette Install the dplyr package in R, and from the help file (type ??dplyr at the command line) access the dplyr vignette (click on dplyr::introduction). Read through the vignette and perform all of the commands.
Large Datasets and You: A Field Guide
The Split-Apply-Combine Strategy for Data Analysis by H. Wickham
Reading:
Eight (No, Nine!) Problems With Big Data
Big data and big business: Should statisticians join in?
Why Big Data is Bad for Science
Is Big Data an Economic Big Dud?
Where Does a Statistician Fit in the Big Data Era?
Reading:
Performance of R At least read the sections: Introduction, Why is R slow?, Microbenchmarking and Implementation performance. There are some suggested exercises, do some!
Memory in R At least read the sections: Memory, object.size(), Total memory use, Garbarge collection. Again try some of the exercises
git If you haven't already, read at least the first tutorial linked.
(optional) Chapter 14 in The Art of R programming by Norman Matloff. Another discussion of speed in R and memory in R you might find useful
Reading:
Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods The foundational paper attempting to rank perceptual tasks
A Tour through the Visualization Zoo Some ideas for more exotic visualizations
Infovis and Statistical Graphics: Different Goals, Different Looks A good discussion by Andrew Gelman and Antony Unwin of the varying goals of graphics.
Reading:
Big Data: are we making a big mistake?
How are databases efficient? Read the answers and follow a few links
Reading:
Chapter 1 from Machine Learning by K. Murphy
Bias Variance tradeoff great tutorial on prediction error
Measuring error great tutorial on measuring prediction error
Cross validation nice slides illustrating cross validation
Reading:
Big Data tools read about the tool assigned to you and submit on Blackboard by Friday June 6. Read about a few others too!