Homework: Data Visualization

HW
Author

Your Name

Note: This assignment must be submitted in github classroom.

Instructions

  • Once you have finished this assignment, render the document (Ctrl/Cmd-Shift-K or the Render button).

  • Commit the qmd file and any other files you have changed to the repository and push your changes.

  • In Canvas, submit a link to your github repository containing the updated files.

Swiss Banknotes

The R package mclust contains a data set called banknote, consisting of (physical) measurements on 200 Swiss bank notes, 100 of which are genuine, while the other half is counterfeit. I’ve saved this data as banknote.csv in this repository.

  1. Use one of the object inspecting functions and describe the data set - what do the variables appear to mean?

Hint: You can read about the dataset by running library(mclust) and then ?banknote - this will provide some additional context.

  1. Draw a barchart of Status. Also, map status to the fill color of the bars. (Yes, this plot is a bit simplistic, but what does it show?)

    Delete this ordered list and write something that includes the following details

    • what type of plot is it?
    • Which variables are mapped to x, to y, and to the (fill) color?
    • What is the main message of the plot: what is your main finding, i.e. what do you want viewers to learn from the plot?
    • Are there any anomalies or outliers?
  2. Draw a histogram of one of the variables in the dataset that shows a distinction between genuine and counterfeit banknotes. Use fill color to show this difference. Choose the binwidth such that there are no gaps in the middle range of the histogram.

    Delete this ordered list and write something that includes the following details

    • what type of plot is it?
    • Which variables are mapped to x, to y, and to the (fill) color?
    • What is the main message of the plot: what is your main finding, i.e. what do you want viewers to learn from the plot?
    • Are there any anomalies or outliers?
  3. Draw a scatterplot of two (continuous) measurements, color by whether the banknote is genuine or fake. Try to find a pair of measurements that provides as much separation as possible between the clusters of points for genuine and counterfeit banknotes.

    Delete this ordered list and write something that includes the following details

    • what type of plot is it?
    • Which variables are mapped to x, to y, and to the (fill) color?
    • What is the main message of the plot: what is your main finding, i.e. what do you want viewers to learn from the plot?
    • Are there any anomalies or outliers?

Take everything you know, and use it for evil

The textbook spent lots of time showing you how to create different types of graphics, but I spent a lot less time showing you all of the different ways you could customize graphics in any plotting library. In this problem, I want you to create the ugliest graphs you can, and then explain why, exactly, you made the decisions you did, and which principles of good graphics you’ve intentionally violated.

Ugliness is subjective, so the goal here is for you to explore the different ways you can customize the finer details of a plot.

Requirements: - Your finished masterpiece must have appropriate axis labels and a title (after all, even ugly plots need to be correctly labeled!). - Your ugliness should not depend solely on color choices - there are many other ways to make things ugly!

You are free to add additional variables and layers, modify the aesthetics used, and leverage other packages. If you use packages I haven’t used in the book, please provide (commented out) installation code in your code chunks so that I know where the package is from (this is especially important if it’s from GitHub and not CRAN or pip).

If you need inspiration, look here.

Use the palmerpenguins data, which has been exported into penguins.csv

Create the ugliest graph you can manage with each language. You can pick which variables to plot, what type of chart to make, and what aesthetic mappings to use.

Part 1: R

Delete this line and explain why your graph is ugly - what principles of good graphics have you violated?

Part 2: Python

Delete this line and explain why your graph is ugly - what principles of good graphics have you violated?

Useful References