Practice Exam (2023)

Exam
Week07
Author

Your Name

Published

February 27, 2024

Your goal on this exam is to demonstrate competency in as many of the objectives of chapters we’ve covered in this class so far as you can.

Ground Rules

  • You may use the textbook and the internet (but the same rules apply - you must be able to explain your answer!)

  • You may NOT confer with other people or AI entities - including posting on StackOverflow, Reddit, etc.

  • You may ask clarifying questions of Dr. Vanderplas by email/zoom or in person

  • You may use R or Python for any of these tasks, but your code must be reproducible - I must be able to run your quarto file on my machine. I have provided R chunks in the correct locations in this file - change them to Python if you wish.

  • For any plot or table you create, be sure to appropriately caption and label it, providing 1-2 sentences to highlight the main purpose/conclusions you can draw from that plot.

  • You should have at least one code chunk for each ## heading below.

Pockets

Jean pockets are useful for carrying a multitude of different items around, but this utility is not evenly shared: women’s pants often have pockets which are large enough to hold chapstick but not a pen, let alone a cell phone.

The Pudding assembled a dataset of jean pocket size measurements, which you can find here. There are two different data sets: the raw pocket measurements, and a second data file of measurements of the largest rectangle with could theoretically fit into the pocket of each pair of jeans evaluated.

Read in the Data

Write code which will:

  1. Download the raw pocket measurements.csv data to a file in this repository
  2. Read the data in, ensuring that columns containing numbers are appropriately formatted. You may consider the fabric column as a character string.

Conduct an Exploratory Data Analysis

Generate at least 3 questions about the data set and find answers to those questions using charts, tables, or numerical summaries.

Questions

Question Answers

Your discussion of this output goes here.

Your discussion of this output goes here.

Your discussion of this output goes here

Comparing Sexes and Styles

Generate one or two plots which best showcase the difference in pocket sizes for different sexes and styles of pants. Use The Pudding’s classification for pants styles, treating straight and boot-cut styles as the same and skinny and slim styles as the same.
You may annotate the plot with the output from statistical tests if you wish, but it is sufficient to highlight the visual differences. Your chart(s) must have appropriate titles and axis labels, and must be constructed to take into account the principles of good graphics discussed in the textbook.

Your discussion of this output goes here.

Replace this paragraph with 1-2 sentences discussing what choices you made to make the plot you generated above perceptually optimal. Which principles of good graphics did you use?

Cell Phone Dimensions

Dr. Vanderplas is interested in finding a pair of pants (from either the men’s section or the women’s section) which can accommodate her cell phone.

Here is some code to read in the data in R and Python and extract the rectanglePhone dataset.

jsonfile <- "https://raw.githubusercontent.com/the-pudding/data/master/pockets/measurementRectangles.json"

library(jsonlite)
library(tidyr)
rects <- fromJSON(jsonfile)
rectanglePhone <- rects %>% unnest("rectanglePhone")
jsonfile = "https://raw.githubusercontent.com/the-pudding/data/master/pockets/measurementRectangles.json"

import pandas as pd
rects = pd.read_json(jsonfile)
rectanglePhoneDims = pd.DataFrame.from_records(rects['rectanglePhone'], nrows = 80)
rectanglePhone = pd.concat([rects.iloc[:,0:21].reset_index(drop=True), rectanglePhoneDims], axis = 1)

Write a function checkpants(phonelength, phonewidth) which takes length and width in centimeters and uses the largest rectangle data set from The Pudding to identify pants compatible with a specific cell phone’s length and width. Return the following information in a list:

  • front - A data frame all jeans which can fit a rectangle of the specified dimension in the front pocket

  • back - a data frame of all jeans which can fit a rectangle of the specified dimension in the back pocket. Assume a phone fits in the back pocket if the width is less than the minWidthBack measurement and the minHeightBack measurement is at least 70% of the phone’s height.

  • price - The average price of a pair of jeans which can fit the specified dimension in either the front or back pocket

  • propMenFront - the proportion of men’s pants that can fit a rectangle of the specified dimension in the front pocket

  • propWomenAny - the proportion of women’s pants that can fit a rectangle of the specified dimension in any pocket

You should ensure that you carefully read the dataset documentation to understand the data you are using before you attempt this question.