Basics

Overgrown Calculator

Basics Cheatsheet

# Addition and Subtraction
2 + 5 - 1
[1] 6
# Multiplication
109*23452
[1] 2556268
# Division
3/7
[1] 0.4285714

More Calculator Operations

# Integer division
7 %/% 2
[1] 3
# Modulo operator (Remainder)
7 %% 2
[1] 1
# Powers
1.5^3
[1] 3.375

Even More Functions

  • Exponentiation
    • exp(x)
  • Logarithms
    • log(x)
    • log(x, base = 10)
  • Trigonometric functions
    • sin(x)
    • asin(x)
    • cos(x)
    • tan(x)

Creating Variables

We can create variables using the assignment operator <-:

x <- 5
MyAge <- 25

We can then perform any of the functions on the variables:

log(x)
[1] 1.609438
MyAge^2
[1] 625

Rules for Variable Creation

  • Variable names can’t start with a number

  • Variables in R are case-sensitive

  • Some common letters are used internally by R and should be avoided as variable names (c, q, t, C, D, F, T, I)

  • There are reserved words that R won’t let you use for variable names. (for, in, while, if, else, repeat, break, next)

  • R will let you use the name of a predefined function. Try not to overwrite those though!

Vectors

A variable does not need to be a single value. We can create a vector using the c function (combine - combines several objects into one):

y <- c(1, 5, 3, 2)

Operations will then be done element-wise:

y / 2
[1] 0.5 2.5 1.5 1.0

Getting Help

We will talk MUCH more about vectors in a bit, but for now, let’s talk about a couple ways to get help. The primary function to use is the help function. Just pass in the name of the function you need help with:

help(head)

The ? function also works:

?head

Googling for help can be difficult at first. You might need to search for R + CRAN + <your query> to get good results

Stackoverflow is VERY helpful

Getting Help

R Reference Card

You can download the reference card from:

Here

Having this open or printed off and near you while working is helpful.


Rstudio cheatsheets

The Rstudio cheatsheets are VERY helpful.

Warnings vs. Errors

  • Routinely beginners to R panic if they see a red message as innocuous as confirming that a library loaded
    • Not all red text mean that there is an error!
  • A warning is a message that does not disturb the program flow but is displayed along with the output
    • Not always a cause for concern
  • An error will terminate a program from being ran
    • Google is a beautiful thing

Your Turn

Using the R Reference Card (and the Help pages, if needed), do the following:

  • Find out how many rows and columns the iris data set has (use data(iris) to load in the dataset). Figure out at least 2 ways to do this.

Hint: “Variable Information” section on the first page of the reference card!

  • Use the rep function to construct the following vector: 1 1 2 2 3 3 4 4 5 5

Hint: “Data Creation” section of the reference card

  • Use rep to construct this vector: 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

Data Frames: Introduction

  • final_shed is a data frame.

  • Data frames hold data sets

  • Not every column need be the same type - like an Excel spreadsheet

  • Each column in a data frame is a vector 1 - so each column needs to have values that are all the same type.

  • We can access different columns using the $ operator.

shedding <- final_shed$total_shedding
treatment <- final_shed$treatment

.footnote[ [1] A column can also be a list! This is a more advanced topic that will be saved for later.]

More about Vectors

A vector is a list of values that are all the same type. We have seen that we can create them using the c or the rep function. We can also use the : operator if we wish to create consecutive values:

a <- 10:15
a
[1] 10 11 12 13 14 15

We can extract the different elements of the vector like so:

shedding[3]
[1] 59.04973

Indexing Vectors

We saw that we can access individual elements of the vector. But indexing is a lot more powerful than that:

head(shedding)
[1] 37.14022 43.88073 59.04973 44.96963 38.74342 56.12656
shedding[c(1, 3, 5)]
[1] 37.14022 59.04973 38.74342
shedding[1:5]
[1] 37.14022 43.88073 59.04973 44.96963 38.74342

Logical Values

  • R has built in support for logical values

  • TRUE and FALSE are built in. T (for TRUE) and F (for FALSE) are supported but can be modified

  • Logicals can result from a comparison using

    • < : “less than”
    • > : “greater than”
    • <= : “less than or equal to”
    • >= : “greater than or equal to”
    • == : “is equal to”
    • != : “not equal to”

Indexing with Logicals

We can index vectors using logical values as well:

x <- c(2, 3, 5, 7)
x[c(TRUE, FALSE, FALSE, TRUE)]
[1] 2 7
x > 3.5
[1] FALSE FALSE  TRUE  TRUE
x[x > 3.5]
[1] 5 7

Logical Examples

bad_shedder <- shedding > 50
shedding[bad_shedder]
[1] 59.04973 56.12656 66.20657 51.98984 58.53921 64.74017 64.27066 56.06566
[9] 53.76049

Your Turn

  • Find out how many pigs had a total shedding value of less than 30 log10 CFUs.

Hint: if you use the sum function on a logical vector, it’ll return how many TRUEs are in the vector:

sum(c(TRUE, TRUE, FALSE, TRUE, FALSE))
[1] 3
  • More Challenging: Calculate the sum of the total shedding log10 CFUs of all pigs with a total shedding value of less than 30 log10 CFUs.

Element-wise Logical Operators

  • & (elementwise AND)
  • | (elementwise OR)
c(T, T, F, F) & c(T, F, T, F)
[1]  TRUE FALSE FALSE FALSE
c(T, T, F, F) | c(T, F, T, F)
[1]  TRUE  TRUE  TRUE FALSE
# Which are high shedders in the control group?
id <- (shedding > 50 & treatment == "control")
final_shed[id,]
# A tibble: 4 × 7
  pignum time_point pig_weight daily_shedding treatment total_shedding  gain
   <dbl>      <dbl>      <dbl>          <dbl> <chr>              <dbl> <dbl>
1    122         21       33.9           5.01 control             59.0  16.8
2    224         21       22.9           3.91 control             56.1  11.4
3    337         21       29.5           5.52 control             66.2  16.2
4    419         21       31             6.21 control             52.0  16.8

Modifying Vectors

We can modify vectors using indexing as well:

x <- shedding[1:5]
x
[1] 37.14022 43.88073 59.04973 44.96963 38.74342
x[1] <- 20
x
[1] 20.00000 43.88073 59.04973 44.96963 38.74342

Vector Elements

Elements of a vector must all be the same type:

head(shedding)
[1] 37.14022 43.88073 59.04973 44.96963 38.74342 56.12656
shedding[bad_shedder] <- ":-("
head(shedding)
[1] "37.1402150411922" "43.8807276727966" ":-("              "44.9696314253854"
[5] "38.7434232007542" ":-("             

By changing a value to a string, all the other values were also changed.

Data Types in R

  • Can use mode or class to find out information about variables

  • str is useful to find information about the structure of your data

  • Many data types: numeric, integer, character, Date, and factor most common

str(final_shed)
tibble [59 × 7] (S3: tbl_df/tbl/data.frame)
 $ pignum        : num [1:59] 77 87 122 160 191 224 337 345 419 458 ...
 $ time_point    : num [1:59] 21 21 21 21 21 21 21 21 21 21 ...
 $ pig_weight    : num [1:59] 25.4 23.9 33.9 28.4 28.9 ...
 $ daily_shedding: num [1:59] 4.61 3.91 5.01 3.91 3.91 ...
 $ treatment     : chr [1:59] "control" "control" "control" "control" ...
 $ total_shedding: num [1:59] 37.1 43.9 59 45 38.7 ...
 $ gain          : num [1:59] 13.9 11.7 16.8 15.1 14.6 ...

Converting Between Types

We can convert between different types using the as series of functions:

pignum <- head(final_shed$pignum)
pignum
[1]  77  87 122 160 191 224
as.character(pignum)
[1] "77"  "87"  "122" "160" "191" "224"
as.numeric("77")
[1] 77
#as.factor()

More About Data Types

Some useful functions

There are a whole variety of useful functions to operate on vectors.

A couple of the more common ones are length, which returns the length (number of elements) of a vector, and sum, which adds up all the elements of a vector.

pig_weight <- final_shed$pig_weight
x <- pig_weight[1:5]
length(x)
[1] 5
sum(x)
[1] 140.36

Note: Be careful, if you use length with a dataframe, it will return number of variables not number of observations.

class(shed)
[1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame" 
length(shed)
[1] 12
prod(dim(shed))
[1] 3540

Statistical Functions

Using the basic functions we’ve learned, it wouldn’t be hard to compute some basic statistics.

(n <- length(pig_weight))
[1] 59
(meanweight <- sum(pig_weight) / n)
[1] 28.82305
(standdev <- sqrt(sum((pig_weight - meanweight)^2) / (n - 1)))
[1] 4.10429

But we don’t have to.

Built-in Statistical Functions

mean(pig_weight)
[1] 28.82305
sd(pig_weight)
[1] 4.10429
summary(pig_weight)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  19.50   25.79   28.80   28.82   32.24   36.30 
quantile(pig_weight, c(.025, .975))
  2.5%  97.5% 
22.279 35.952 

Your Turn

  • Which pigs have a shedding value less than or equal to 30 OR is in the Acid treatment group?

  • Explore any more calculations in the dataset you may find interesting.