2025-02-13
For observed data (rather than a study), a (descriptive) data exploration is often the only thing we can do
But with any new data set, you should do some initial exploration: what are the assumptions (what have you been told about the data?) - are implied expectations holding up?
Make sure to read through the EDA chapter
During an unusual episode, a number of people were exposed and some died. You are asked to determine the nature of the unusual episode by asking data-motivated yes/no questions
You will see two tables:
deaths by economic status and sex
deaths by economic status and age
Before seeing the data
what are your expectations regarding exposure by economic status, by sex, and by age?
what are your expectations regarding death numbers/rates by economic status, by sex, and by age?
Now we look at the data: What anomalies do you notice?
By Economic Status and Sex
---------------------------------------------------------------------------
Population Exposed Number of Deaths per 100
to Risk Deaths Exposed to Risk
Economic ----------------------------------------------------------------
Status Male Female Both Male Female Both Male Female Both
---------------------------------------------------------------------------
I 180 145 325 118 4 122 65 3 37
II 179 106 285 154 13 167 87 12 59
III 510 196 706 422 106 528 83 54 73
IV 862 23 885 670 3 673 78 13 76
---------------------------------------------------------------------------
Total 1731 470 2201 1364 126 1490 80 27 67
By Economic Status and Age
---------------------------------------------------------------------------
Population Exposed Number of Deaths per 100
to Risk Deaths Exposed to Risk
Economic -----------------------------------------------------------------
Status Adult Child Both Adult Child Both Adult Child Both
---------------------------------------------------------------------------
I 319 6 325 122 0 122 38 0 37
II 261 24 285 167 0 167 64 0 59
III 627 79 706 476 52 528 76 66 73
IV 885 0 885 673 0 673 76 - 76
---------------------------------------------------------------------------
Total 2092 109 2201 1438 52 1490 69 48 67
Work on homework #4: what are your expectations regarding the data?
One way of asking questions, is to re-phrase an expectation in form of ‘(how) does the data deviate from …?’
Generally, when the data does not meet expectations, we find weird stuff …