2024-04-10
What are the attributes of tidy data?
What are the packages you considered as handy in R/python for data cleaning?
What are the commonly used functions in the package and what are the functions for?
Each variable has its own column
Each observation has its own row
Each value has its own cell
dplyr
filter
: operate row wise, filter out unwanted rows or keep the wanted rows
select
: operate column wise, select wanted columns
mutate
: create a new column
there is more…
Lubridate
Deal date and time
Pay attention to the data type!!!!
Python and base R are bit more complicated to manipulate the data