Oregon State University
Spring 2014
Lectures: MWF 900-950 OWEN 103
Charlotte Wickham, 76 Kidder charlotte.wickham@stat.oregonstate.edu
Alix Gitelman, 48 Kidder gitelman@stat.oregonstate.edu
Office hours:
Wickham: 1-2pm WF in 76 Kidder
Gitelman: 2-3pm M in 48 Kidder
dplyr vignette Install the dplyr package in R, and from the help file (type ??dplyr at the command line) access the dplyr vignette (click on dplyr::introduction). Read through the vignette and perform all of the commands.
Large Datasets and You: A Field Guide
The Split-Apply-Combine Strategy for Data Analysis by H. Wickham
Eight (No, Nine!) Problems With Big Data
Big data and big business: Should statisticians join in?
Why Big Data is Bad for Science
Is Big Data an Economic Big Dud?
Where Does a Statistician Fit in the Big Data Era?
Performance of R At least read the sections: Introduction, Why is R slow?, Microbenchmarking and Implementation performance. There are some suggested exercises, do some!
Memory in R At least read the sections: Memory, object.size(), Total memory use, Garbarge collection. Again try some of the exercises
git If you haven't already, read at least the first tutorial linked.
(optional) Chapter 14 in The Art of R programming by Norman Matloff. Another discussion of speed in R and memory in R you might find useful
Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods The foundational paper attempting to rank perceptual tasks
A Tour through the Visualization Zoo Some ideas for more exotic visualizations
Infovis and Statistical Graphics: Different Goals, Different Looks A good discussion by Andrew Gelman and Antony Unwin of the varying goals of graphics.
Big Data: are we making a big mistake?
How are databases efficient? Read the answers and follow a few links
Chapter 1 from Machine Learning by K. Murphy
Bias Variance tradeoff great tutorial on prediction error
Measuring error great tutorial on measuring prediction error
Cross validation nice slides illustrating cross validation
Big Data tools read about the tool assigned to you and submit on Blackboard by Friday June 6. Read about a few others too!