[1] 6
[1] 2556268
[1] 0.4285714
exp(x)
log(x)
log(x, base = 10)
sin(x)
asin(x)
cos(x)
tan(x)
We can create variables using the assignment operator <-
:
We can then perform any of the functions on the variables:
Variable names can’t start with a number
Variables in R
are case-sensitive
Some common letters are used internally by R and should be avoided as variable names (c, q, t, C, D, F, T, I)
There are reserved words that R won’t let you use for variable names. (for, in, while, if, else, repeat, break, next)
R will let you use the name of a predefined function. Try not to overwrite those though!
A variable does not need to be a single value. We can create a vector using the c
function (combine - combines several objects into one):
Operations will then be done element-wise:
We will talk MUCH more about vectors in a bit, but for now, let’s talk about a couple ways to get help. The primary function to use is the help
function. Just pass in the name of the function you need help with:
The ?
function also works:
Googling for help can be difficult at first. You might need to search for R + CRAN + <your query> to get good results
Stackoverflow is VERY helpful
R Reference Card
You can download the reference card from:
Having this open or printed off and near you while working is helpful.
Rstudio cheatsheets
The Rstudio cheatsheets are VERY helpful.
Using the R Reference Card (and the Help pages, if needed), do the following:
iris
data set has (use data(iris)
to load in the dataset). Figure out at least 2 ways to do this.Hint: “Variable Information” section on the first page of the reference card!
rep
function to construct the following vector: 1 1 2 2 3 3 4 4 5 5
Hint: “Data Creation” section of the reference card
rep
to construct this vector: 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
final_shed
is a data frame.
Data frames hold data sets
Not every column need be the same type - like an Excel spreadsheet
Each column in a data frame is a vector 1 - so each column needs to have values that are all the same type.
We can access different columns using the $
operator.
.footnote[ [1] A column can also be a list! This is a more advanced topic that will be saved for later.]
A vector is a list of values that are all the same type. We have seen that we can create them using the c
or the rep
function. We can also use the :
operator if we wish to create consecutive values:
We can extract the different elements of the vector like so:
We saw that we can access individual elements of the vector. But indexing is a lot more powerful than that:
R has built in support for logical values
TRUE and FALSE are built in. T (for TRUE) and F (for FALSE) are supported but can be modified
Logicals can result from a comparison using
<
: “less than”>
: “greater than”<=
: “less than or equal to”>=
: “greater than or equal to”==
: “is equal to”!=
: “not equal to”We can index vectors using logical values as well:
Hint: if you use the sum
function on a logical vector, it’ll return how many TRUEs are in the vector:
&
(elementwise AND)|
(elementwise OR)[1] TRUE FALSE FALSE FALSE
[1] TRUE TRUE TRUE FALSE
# Which are high shedders in the control group?
id <- (shedding > 50 & treatment == "control")
final_shed[id,]
# A tibble: 4 × 7
pignum time_point pig_weight daily_shedding treatment total_shedding gain
<dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
1 122 21 33.9 5.01 control 59.0 16.8
2 224 21 22.9 3.91 control 56.1 11.4
3 337 21 29.5 5.52 control 66.2 16.2
4 419 21 31 6.21 control 52.0 16.8
We can modify vectors using indexing as well:
Elements of a vector must all be the same type:
[1] 37.14022 43.88073 59.04973 44.96963 38.74342 56.12656
[1] "37.1402150411922" "43.8807276727966" ":-(" "44.9696314253854"
[5] "38.7434232007542" ":-("
By changing a value to a string, all the other values were also changed.
Can use mode
or class
to find out information about variables
str
is useful to find information about the structure of your data
Many data types: numeric, integer, character, Date, and factor most common
tibble [59 × 7] (S3: tbl_df/tbl/data.frame)
$ pignum : num [1:59] 77 87 122 160 191 224 337 345 419 458 ...
$ time_point : num [1:59] 21 21 21 21 21 21 21 21 21 21 ...
$ pig_weight : num [1:59] 25.4 23.9 33.9 28.4 28.9 ...
$ daily_shedding: num [1:59] 4.61 3.91 5.01 3.91 3.91 ...
$ treatment : chr [1:59] "control" "control" "control" "control" ...
$ total_shedding: num [1:59] 37.1 43.9 59 45 38.7 ...
$ gain : num [1:59] 13.9 11.7 16.8 15.1 14.6 ...
We can convert between different types using the as
series of functions:
[1] 77 87 122 160 191 224
[1] "77" "87" "122" "160" "191" "224"
[1] 77
There are a whole variety of useful functions to operate on vectors.
A couple of the more common ones are length
, which returns the length (number of elements) of a vector, and sum
, which adds up all the elements of a vector.
Note: Be careful, if you use length
with a dataframe, it will return number of variables not number of observations.
Using the basic functions we’ve learned, it wouldn’t be hard to compute some basic statistics.
[1] 59
[1] 28.82305
[1] 4.10429
But we don’t have to.
Which pigs have a shedding value less than or equal to 30 OR is in the Acid treatment group?
Explore any more calculations in the dataset you may find interesting.