2025-02-27
Plan for Thursdays:
Are there more questions for the Graphics homework?
… next chapter
data is the central object:
what is data
?
every action we do, takes a data set as an argument and returns a (modified) data set
data in -> action -> data out
pipe operator %>%
or |>
let’s us string these actions together
data_in %>% action1 %>% action2 %>% action3 -> data_out
(
dot approach .
)
(in python)data_out = ( data_in.action1.action2.action3.
action4 )
Read as ‘then’
small number of actions that work together (think old-timey LEGO, not play mobile)
get a subset of the rows
get a subset of the columns
transform/add variables
calculate numeric summaries
introduce special groupings
small number of actions that work together (think old-timey LEGO, not play mobile)
get a subset of the rows: filter
get a subset of the columns: select
transform/add variables: mutate
calculate numeric summaries: summarize
introduce special groupings: group_by
small number of actions that work together (think old-timey LEGO, not play mobile)
get a subset of the rows: query
get a subset of the columns: select by name or index
transform/add variables: assign
calculate numeric summaries: describe
introduce special groupings: groupby
(and agg
or expanding
)
Grouping acts as a multiplier for each of the other actions
Grouping by (usually categorical) variable(s) allows us to apply all of the other actions more specifically to each (combination of the) level(s) of variable(s)
Example (summarize
):
summary across a whole dataset
find the average height
becomes group specific
find the average height for men and women (group by gender)
find the average height by sport (group by athletic discipline)
find the average height by sport and sex
Grouping by (usually categorical) variable(s) is syntactic glue that allows us to apply all of the other actions more specifically
Example (filter
):
filter across a whole dataset
find the top earner
becomes group specific
find the top earner among men and women
find the top earner in each athletic discipline
find the top earner among men and women in each athletic discipline
You’re ready to tackle the homework on data manipulation
Continue working on homework assignment
Start of (take-home) midterm week
Prep with the practice exam, look over the covered topics