Intro
Data Wrangling offers the flexibility of collaboration with other formats.
Data wrangling can be defined as the process of cleaning, organizing, and transforming raw data into the desired format for analysts to use for prompt decision-making. Also known as data cleaning.
Improve data usability by converting raw data into a compatible format for the end system
Quickly build data flows within an intuitive user interface
Schedule and automate the data-flow process
Integrate various types of information and their sources (like databases, web services, files, etc.)
Process very large volumes of data easily and easily share data-flow techniques.
Happy families are all alike; every unhappy family is unhappy in its own way. - Leo Tolstoy
Happy families are all alike; every unhappy family is unhappy in its own way. - Leo Tolstoy
Five main ways tables of data tend not to be tidy:
Column headers are values, not variable names.
Multiple variables are stored in one column.
Variables are stored in both rows and columns.
Multiple types of observational units are stored in the same table.
A single observational unit is stored in multiple tables.
READING FILE TYPES - What file types can be read in with R? - Reading in different file types - Formatting your data: A tidy data discussion
(Review from Graphics Lecture)
DPLYR PACKAGE - filter
, mutate
, select
, summarise
, group by
, and arrange
TIDYR PACKAGES - What is tidy data? - pivot longer, pivot wider and separate functions - lubridate
package basics
JOINING DATASETS - Basic set theory logic (joining/combining datasets)