Data cleaning

2024-04-10

We have read

Data Cleaning

Questions For You

  • What are the attributes of tidy data?

  • What are the packages you considered as handy in R/python for data cleaning?

  • What are the commonly used functions in the package and what are the functions for?

Tidy data

  • Each variable has its own column

  • Each observation has its own row

  • Each value has its own cell

Packages and Functions

  • dplyr

    • filter : operate row wise, filter out unwanted rows or keep the wanted rows

    • select : operate column wise, select wanted columns

    • mutate : create a new column

    • there is more…

      • group_by, summarise, stat functions…
  • Lubridate

    • Deal date and time

    • Pay attention to the data type!!!!

  • Python and base R are bit more complicated to manipulate the data

    • lambda function and apply family

Next Time

Homework 9

  • Use github classroom to accept the assignment
  • Don’t forget committing and pushing the changes!