Security footage shows that there were 2 witnesses. The first witness lives at the last house on "Northwestern Dr". The second witness, named Annabel, lives somewhere on "Franklin Ave".
2025-04-08
Make sure to look at the relationship between the different data sets!
The first filter statement gives you enough information to start searching for witness statements.
Security footage shows that there were 2 witnesses. The first witness lives at the last house on "Northwestern Dr". The second witness, named Annabel, lives somewhere on "Franklin Ave".
by=
How do we join person
and interview
?
Variable id
in person is linking to person_id
in interview
id name license_id address_number address_street_name
1 10000 Christoper Peteuil 993845 624 Bankhall Ave
2 10007 Kourtney Calderwood 861794 2791 Gustavus Blvd
3 10010 Muoi Cary 385336 741 Northwestern Dr
4 10016 Era Moselle 431897 1987 Wood Glade St
5 10025 Trena Hornby 550890 276 Daws Hill Way
6 10027 Antione Godbolt 439509 2431 Zelham Dr
ssn
1 747714076
2 477972044
3 828638512
4 614621061
5 223877684
6 491650087
transcript
1 <NA>
2 CHAPTER IV. The Rabbit Sends in a Little Bill\n
3 <NA>
4 \n
5 \n
6 nearer to watch them, and just as she came up to them she heard one of\n
General idea of joining tables
Data sets are joined along values of variables.
In dplyr there are several join functions: left_join
, inner_join
, full_join
, …
Differences between join functions only visible, if not all values in one set have values in the other
id trt value
1 1 A 5
2 2 B 3
3 3 C 7
4 4 A 1
5 5 B 2
6 6 C 3
all elements in the left data set are kept
non-matches are filled in by NA
right_join
works symmetric
NA
sometimes we unexpectedly cannot match values: missing values, different spelling, …
join can be along multiple variables, e.g. by = c("ID", "Date")
joining variable(s) can have different names, e.g. by = c("State" = "Name")
always make sure to check dimensions of data before and after a join
check on missing values; help with that: anti_join