Data Explorations

Stat 251

2025-02-13

Exploratory Data Analysis

For observed data (rather than a study), a (descriptive) data exploration is often the only thing we can do
But with any new data set, you should do some initial exploration: what are the assumptions (what have you been told about the data?) - are implied expectations holding up?
Make sure to read through the EDA chapter

In-class exercise

During an unusual episode, a number of people were exposed and some died. You are asked to determine the nature of the unusual episode by asking data-motivated yes/no questions

You will see two tables:

deaths by economic status and sex
deaths by economic status and age

What would you expect?

Before seeing the data

what are your expectations regarding exposure by economic status, by sex, and by age?
what are your expectations regarding death numbers/rates by economic status, by sex, and by age?

Now we look at the data: What anomalies do you notice?

Deaths by Economic Status and Sex

By Economic Status and Sex
---------------------------------------------------------------------------
           Population Exposed         Number of            Deaths per 100
                to Risk                 Deaths            Exposed to Risk
Economic   ----------------------------------------------------------------
Status     Male  Female  Both     Male  Female  Both     Male  Female  Both
---------------------------------------------------------------------------
I          180     145    325     118      4     122      65      3     37
II         179     106    285     154     13     167      87     12     59
III        510     196    706     422    106     528      83     54     73
IV         862      23    885     670      3     673      78     13     76
---------------------------------------------------------------------------
Total      1731    470   2201     1364    126   1490      80     27     67

Deaths by Economic Status and Age

                       By Economic Status and Age
---------------------------------------------------------------------------
           Population Exposed         Number of            Deaths per 100
                to Risk                 Deaths            Exposed to Risk
Economic  -----------------------------------------------------------------
Status    Adult   Child  Both    Adult   Child  Both    Adult   Child  Both
---------------------------------------------------------------------------
I          319      6     325     122      0     122      38      0     37
II         261     24     285     167      0     167      64      0     59
III        627     79     706     476     52     528      76     66     73
IV         885      0     885     673      0     673      76      -     76
---------------------------------------------------------------------------
Total      2092    109   2201     1438    52    1490      69     48     67

Homework - Reading Data with Cookies

Work on homework #4: what are your expectations regarding the data?
One way of asking questions, is to re-phrase an expectation in form of ‘(how) does the data deviate from …?’
Generally, when the data does not meet expectations, we find weird stuff …