Creating Good Graphics

Stat 251

2026-02-19

Homework: Graphics

  • How does the homework tie to the readings?

  • Working on skills: how to make graphics

  • Work on concepts: what are mappings, and how do they effect the conclusions

  • Work on presentation: what are more effective ways of presenting information

Homework rubric

Rubric for homework assignment. Data exploration is worth 2 points, and includes each variable explored and each plot described with 1-2 sentences. Grammar of graphics - probabilities and grammar of graphics - agreement are worth 1.5 points each. Ugly chart in R is worth 2.5 points, including the explanation of why the chart is ugly. Ugly chart in python is worth 2.5 points, including the explanation of why the chart is ugly.

Graphics week homework

  • Part I: Make visual summaries for two new data sets and think about mappings

  • Part II: Use your knowledge to create the worst!

Data Exploration

Make charts for all variables that are listed by name:

groundhogs.csv

  • lat, long
  • country
  • isGroundhog
  • active
  • predictionsCount

predictions.csv

  • isGroundhog
  • year
  • shadow

Variable active

library(tidyverse)
groundhogs <- read.csv("https://raw.githubusercontent.com/stat-assignments/eda-groundhogs/97ab0e01b64aa3a1749247983a9b05a0c30b5c0c/groundhogs.csv")
groundhogs %>% ggplot(aes(x = active)) + geom_bar()

This is a barchart of the variable active, the variable is mapped to the x axis, the count for each bar (corresponding to the height of the bars) is mapped to y. Finding: Very few (2) groundhogs are not active.

Probabilities and Agreement

Answer the following two questions using charts. Explain your chart, and explain how it answers the question.

Do different groundhogs have different probabilities of predicting 6 more weeks of winter?

How much do North American groundhogs tend to agree on their predictions?

Do different groundhogs have different probabilities of predicting 6 more weeks of winter?

predictions <- read.csv("https://raw.githubusercontent.com/stat-assignments/eda-groundhogs/refs/heads/main/groundhog-predictions.csv")

predictions %>% 
  mutate(name = reorder(factor(name), name, length)) %>%
  ggplot(aes(x = name)) + geom_bar() +
  geom_bar(aes( weight = shadow), fill = "darkorange") + 
  theme(axis.text.x = element_text(angle = 60, hjust = 1)) + 
  ggtitle("Number of predictions\nNumber of times seeing a shadow in orange")

What about missing values in the shadow variable?

How do we need to change the previous chart?

predictions <- read.csv("https://raw.githubusercontent.com/stat-assignments/eda-groundhogs/refs/heads/main/groundhog-predictions.csv")

predictions %>% 
  filter(!is.na(shadow)) %>%
  mutate(name = reorder(factor(name), name, length)) %>%
  ggplot(aes(x = name)) + geom_bar(aes(fill=factor(shadow)), position = "fill") +
  theme(axis.text.x = element_text(angle = 60, hjust = 1)) 

limitations: different groundhogs have made very different number of predictions (and for different years)

How much do North American groundhogs tend to agree on their predictions?

For years since 2010 … in each year close to 50/50 shadow/noshadow prediction - that’s the least amount of agreement we can possibly get!

But … when we color points by prediction, there seems to be regional agreement

Is this perceived agreement real?

Which plot shows the most geographic agreement?

year was 2017 data is in 9

… maybe there is not even regional geographic agreement between the predictions.

Lineups help us to calibrate our eyes and distinguish random patterns from real visual findings.

Resources for this week

Homework

  • Explanations! What can you see from the plot? What is its purpose?