# Credit to Kasia and minorly edited to create output file and test plot
# Blog post at https://r-tastic.co.uk/post/from-messy-to-tidy/
library(rvest)
library(dplyr)
<- "https://www.nu3.de/blogs/nutrition/food-carbon-footprint-index-2018"
url
# scrape the website
<- read_html(url)
url_html
# extract the HTML table
<- url_html %>%
whole_table html_nodes('table') %>%
html_table(fill = TRUE) %>%
1]]
.[[
<- whole_table %>%
table_content select(-X1) %>% # remove redundant column
filter(!dplyr::row_number() %in% 1:3) # remove redundant rows
<- url_html %>%
raw_headers html_nodes(".thead-icon") %>%
html_attr('title')
<- raw_headers[28:length(raw_headers)]
tidy_bottom_header # tidy_bottom_header[1:10]
<- raw_headers[17:27]
raw_middle_header # raw_middle_header
<- c(
tidy_headers rep(raw_middle_header[1:7], each = 2),
"animal_total",
rep(raw_middle_header[8:length(raw_middle_header)], each = 2),
"non_animal_total",
"country_total")
# tidy_headers
<- paste(tidy_headers, tidy_bottom_header, sep = ';')
combined_colnames colnames(table_content) <- c("Country", combined_colnames)
<- table_content %>%
table_content mutate_at(vars(2:26), as.numeric)
Homework 10: Data Transformations
Note: This assignment must be submitted in github classroom.
This week’s assignment uses data from Tidy Tuesday (link) and relates to food consumption and CO2 emissions.
The code above reads the data in from the original webpage and gets it into tabular form.
Your job is to complete the following tasks:
Describe the state of the data set,
table_content
.- What are the variables in the data set?
- var1
- var2
- (add more as necessary)
- Is it in tidy form? What principles of tidy data does this violate?
Your answer here - What steps do you need to take to get it into tidy form?
- (add more steps as necessary)
- What are the variables in the data set?
Sketch out what the final (tidy) data set will look like. You can use markdown table syntax or a picture here, but if you use a picture, upload it to imgur and include the image link in this document USING PROPER MARKDOWN SYNTAX.
Write R or python code for each step in the process you identified in #1. Show what the data looks like at each step using
head()
. Each step should be in a different code chunk.For each food type (you may have to remove total values), plot the relationship between Carbon output and Consumption (use facets to get separate plots for each type of food). What do you notice for each plot? If you want to reduce carbon emissions, what foods should you eat less of?
Look at the plot above again. Do you have any concerns about the data? The data source?