Documenting Data

What are the critical components of data documentation?

  • Who collected the data
  • Why the data was collected
  • What the data is about
  • When the data was collected
  • Where the data was collected
  • How the data was generated/collected
  • Structure of the data
  • Formatting decisions in the data
  • Data validation/quality control
  • How the data can be reused/license
  • Suggested data analysis methods
  • Measurement instruments used

Documentation is Project Dependent

Project 1: Building a Shoe Print Wear Database

  • 150 pairs of shoes
  • 2 brands of shoes
  • several sizes for each brand
  • step counters used on the shoes
  • questionnaires measuring activities
  • wearer weight/height/gait

Shoe Measurements

Initial measurement period + 2-3 additional measurement periods (~6 weeks between)

  • Photos of shoe soles
  • Digital shoe sole prints
  • Powder prints
    • Film
    • Paper
    • Vinyl flooring
  • 3d scans of shoe soles

Measurements taken in the lab by research assistants.

Important Documentation?

Important Documentation?

  • Probably should have included which research assistant was wearing the shoe, how much they weighed, their gait/height/etc., and so on.

Whoops.

A gif of a child wearing a lampshade walking into an oven and falling backwards

Documentation is Project Dependent

Project 2: Wire cuts

  • Goal is to estimate the length of sharp surfaces on all wire-cutting tool in peoples’ homes

  • General survey with instructions for measuring each type of tool

  • Collected data is a list of tool types, # blades, # cutting surfaces, and # of that tool

  • Estimates are generated by adding up total length of cutting surfaces

Important Documentation?

Codebooks

Basic documentation that contains:

  • Variable name in the code
  • Long-form description of what was measured
  • Units of measurement
  • Acceptable values
  • Values used to indicate missingness, refusal to respond, etc.
  • Additional notes that may be relevant

Very common for government data - CDC codebooks are intense.

Data Doc Influences Analysis

  • Experimental design
  • Randomization
  • Sampling strategy
  • Random effects
  • Transformations of collected data
  • Sources of measurement error

Data Documentation

Documentation is a love letter that you write to your future self

Damian Conway

Additional Resources

  • DDI Alliance - probably overkill but in a good way

  • Data Librarians are amazing to work with