Data Wrangling

Return to rwrks homepage

This workshop will to prepare you for dealing with messy data by walking you through real-life examples. We will work on improving your programming skills and help you move beyond using copy-and-paste. We will discuss how to write functions in order to reduce duplication in your code and automate common tasks and how to use iteration in order to further reduce duplication. You will leave with skills that will allow you to both tackle problems with more ease.

The course will be data centric, with lots of different data sets that illustrate examples of the different techniques used for different problems.

Timetable

Date	Notes	Lectures and Resources
9 - 9:15	Introduction	reading in basic file types: .xls, .csv, .txt, .xport and more general functions: filter, join, …
9:15 - 10:05	Reading Files	Excel files vs. text, data organization 2-Files.R, midwest.csv, midwest.xls
10:05 - 10:30	Break
10:30 - 12:15	Summarizing with dplyr	Pipe operator and dplyr verbs 3-dplyr.R pitch.csv
12:15 - 1:15	Lunch Break (on your own)
1:15 - 2:45	Tidy Data	Restructuring data with pivot wider, pivot longer, and separate. 4-tidyr.R, frenchfries.csv, billboard.csv, flights.csv, occupation-1870.csv
2:45 - 3:00	Break
3:00 - 4:00	Joining Data	Join dataframes together using SQL-based logic 5-joining.R, boxoffice.csv, baseball.csv
3:55 - 4:00	Evaluation	Help us make the workshops better!

Your Turn Solutions

Your Turn Solutions

Useful Links

The Split-Apply-Combine Strategy for Data Analysis, Journal of Statistical Software, 2011
Overview of base apply functions
Dplyr and Tidyr Cheat Sheet