Data Structures

Matrices

  • A matrix is a rectangular arrangement of numbers in rows and columns
  • There are some functions in R that will require your data to be arranged as a matrix.
# Creating a matrix (3x3)
row_matrix <-  matrix(
  # Taking sequence of elements 
  c(1, 2, 3, 4, 5, 6, 7, 8, 9),
  # Number of rows
  nrow = 3,  
  # No. of columns
  ncol = 3,        
  # By default matrices are in column-wise order
  # So this parameter decides how to arrange the matrix
  byrow = TRUE         
)

row_matrix
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

Naming Rows and Columns in Matrices

# Naming rows
rownames(row_matrix) = c("r1", "r2", "r3")
  
# Naming columns
colnames(row_matrix) = c("c1", "c2", "c3")

row_matrix
   c1 c2 c3
r1  1  2  3
r2  4  5  6
r3  7  8  9

Indexing Elements in a Matrix

If only need certain elements in a matrix, you can index them

#matrix[row,col]

row_matrix[1,]
c1 c2 c3 
 1  2  3 
row_matrix[,1]
r1 r2 r3 
 1  4  7 
row_matrix[c(1,3),2]
r1 r3 
 2  8 

Matrix Multiplication

A <- matrix(c(1, 2, 3, 4), ncol=2)
A
     [,1] [,2]
[1,]    1    3
[2,]    2    4
B <- matrix(c(5, 6, 7, 8), ncol=2) #would print out similarily

A%*%B
     [,1] [,2]
[1,]   23   31
[2,]   34   46
# Note * does element by element multiplication

t(A) #transpose of A
     [,1] [,2]
[1,]    1    2
[2,]    3    4

Your Turn

  • In row_matrix, change ‘byrow = FALSE’ to see what happens
  • Recreate this matrix
     [,1] [,2]
[1,]    2    8
[2,]    4   10
[3,]    6   12
  • Index the above matrix to return “4”

Data Frames

  • Data Frames are the work horse of R objects

  • Structured by rows and columns and can be indexed

  • Each column is a variable of one type

  • Column names can be used to index a variable

  • Advice for naming variables applys to naming columns

  • Can be specified by grouping vectors of equal length as columns

Data Frame Indexing

  • Elements indexed similar to a vector using [ ]

  • df[i,j] will select the element in the \(i^{th}\) row and \(j^{th}\) column

  • df[ ,j] will select the entire \(j^{th}\) column and treat it as a vector

  • df[i ,] will select the entire \(i^{th}\) row and treat it as a vector

  • Logical or integer vectors can also be used in place of i and j to subset the row and columns

Adding a new Variable to a Data Frame

  • Maybe you need to do a calculation using data in your data frame

  • Create a new vector that is the same length as other columns

  • Append new column to the data frame using the $ operator

  • The new data frame column will adopt the name of the vector

Data Frame Demo

Use Edgar Anderson’s Iris Data:

flower <- iris

head(flower)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Data Frame Demo

Select Species column (5th column):

flower[,5]
  [1] setosa     setosa     setosa     setosa     setosa     setosa    
  [7] setosa     setosa     setosa     setosa     setosa     setosa    
 [13] setosa     setosa     setosa     setosa     setosa     setosa    
 [19] setosa     setosa     setosa     setosa     setosa     setosa    
 [25] setosa     setosa     setosa     setosa     setosa     setosa    
 [31] setosa     setosa     setosa     setosa     setosa     setosa    
 [37] setosa     setosa     setosa     setosa     setosa     setosa    
 [43] setosa     setosa     setosa     setosa     setosa     setosa    
 [49] setosa     setosa     versicolor versicolor versicolor versicolor
 [55] versicolor versicolor versicolor versicolor versicolor versicolor
 [61] versicolor versicolor versicolor versicolor versicolor versicolor
 [67] versicolor versicolor versicolor versicolor versicolor versicolor
 [73] versicolor versicolor versicolor versicolor versicolor versicolor
 [79] versicolor versicolor versicolor versicolor versicolor versicolor
 [85] versicolor versicolor versicolor versicolor versicolor versicolor
 [91] versicolor versicolor versicolor versicolor versicolor versicolor
 [97] versicolor versicolor versicolor versicolor virginica  virginica 
[103] virginica  virginica  virginica  virginica  virginica  virginica 
[109] virginica  virginica  virginica  virginica  virginica  virginica 
[115] virginica  virginica  virginica  virginica  virginica  virginica 
[121] virginica  virginica  virginica  virginica  virginica  virginica 
[127] virginica  virginica  virginica  virginica  virginica  virginica 
[133] virginica  virginica  virginica  virginica  virginica  virginica 
[139] virginica  virginica  virginica  virginica  virginica  virginica 
[145] virginica  virginica  virginica  virginica  virginica  virginica 
Levels: setosa versicolor virginica

Demo (Continued)

Select Species column with the $ operator:

flower$Species
  [1] setosa     setosa     setosa     setosa     setosa     setosa    
  [7] setosa     setosa     setosa     setosa     setosa     setosa    
 [13] setosa     setosa     setosa     setosa     setosa     setosa    
 [19] setosa     setosa     setosa     setosa     setosa     setosa    
 [25] setosa     setosa     setosa     setosa     setosa     setosa    
 [31] setosa     setosa     setosa     setosa     setosa     setosa    
 [37] setosa     setosa     setosa     setosa     setosa     setosa    
 [43] setosa     setosa     setosa     setosa     setosa     setosa    
 [49] setosa     setosa     versicolor versicolor versicolor versicolor
 [55] versicolor versicolor versicolor versicolor versicolor versicolor
 [61] versicolor versicolor versicolor versicolor versicolor versicolor
 [67] versicolor versicolor versicolor versicolor versicolor versicolor
 [73] versicolor versicolor versicolor versicolor versicolor versicolor
 [79] versicolor versicolor versicolor versicolor versicolor versicolor
 [85] versicolor versicolor versicolor versicolor versicolor versicolor
 [91] versicolor versicolor versicolor versicolor versicolor versicolor
 [97] versicolor versicolor versicolor versicolor virginica  virginica 
[103] virginica  virginica  virginica  virginica  virginica  virginica 
[109] virginica  virginica  virginica  virginica  virginica  virginica 
[115] virginica  virginica  virginica  virginica  virginica  virginica 
[121] virginica  virginica  virginica  virginica  virginica  virginica 
[127] virginica  virginica  virginica  virginica  virginica  virginica 
[133] virginica  virginica  virginica  virginica  virginica  virginica 
[139] virginica  virginica  virginica  virginica  virginica  virginica 
[145] virginica  virginica  virginica  virginica  virginica  virginica 
Levels: setosa versicolor virginica

Demo (Continued)

flower$Species == "setosa"
  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 [13]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 [25]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 [37]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 [49]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[145] FALSE FALSE FALSE FALSE FALSE FALSE

Demo (Continued)

flower[flower$Species=="setosa", ]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
11          5.4         3.7          1.5         0.2  setosa
12          4.8         3.4          1.6         0.2  setosa
13          4.8         3.0          1.4         0.1  setosa
14          4.3         3.0          1.1         0.1  setosa
15          5.8         4.0          1.2         0.2  setosa
16          5.7         4.4          1.5         0.4  setosa
17          5.4         3.9          1.3         0.4  setosa
18          5.1         3.5          1.4         0.3  setosa
19          5.7         3.8          1.7         0.3  setosa
20          5.1         3.8          1.5         0.3  setosa
21          5.4         3.4          1.7         0.2  setosa
22          5.1         3.7          1.5         0.4  setosa
23          4.6         3.6          1.0         0.2  setosa
24          5.1         3.3          1.7         0.5  setosa
25          4.8         3.4          1.9         0.2  setosa
26          5.0         3.0          1.6         0.2  setosa
27          5.0         3.4          1.6         0.4  setosa
28          5.2         3.5          1.5         0.2  setosa
29          5.2         3.4          1.4         0.2  setosa
30          4.7         3.2          1.6         0.2  setosa
31          4.8         3.1          1.6         0.2  setosa
32          5.4         3.4          1.5         0.4  setosa
33          5.2         4.1          1.5         0.1  setosa
34          5.5         4.2          1.4         0.2  setosa
35          4.9         3.1          1.5         0.2  setosa
36          5.0         3.2          1.2         0.2  setosa
37          5.5         3.5          1.3         0.2  setosa
38          4.9         3.6          1.4         0.1  setosa
39          4.4         3.0          1.3         0.2  setosa
40          5.1         3.4          1.5         0.2  setosa
41          5.0         3.5          1.3         0.3  setosa
42          4.5         2.3          1.3         0.3  setosa
43          4.4         3.2          1.3         0.2  setosa
44          5.0         3.5          1.6         0.6  setosa
45          5.1         3.8          1.9         0.4  setosa
46          4.8         3.0          1.4         0.3  setosa
47          5.1         3.8          1.6         0.2  setosa
48          4.6         3.2          1.4         0.2  setosa
49          5.3         3.7          1.5         0.2  setosa
50          5.0         3.3          1.4         0.2  setosa

Demo (Continued)

two_sepal_width <- flower$Sepal.Width * 2

flower_new <- data.frame(flower,two_sepal_width)

head(flower_new, n=3)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species two_sepal_width
1          5.1         3.5          1.4         0.2  setosa             7.0
2          4.9         3.0          1.4         0.2  setosa             6.0
3          4.7         3.2          1.3         0.2  setosa             6.4
ncol(flower)
[1] 5
ncol(flower_new)
[1] 6

Creating our own Data Frame

Create our own data frame using data_frame function

library(tidyverse)

mydf <- data_frame(NUMS = 1:5, 
                   lets = letters[1:5],
                   vehicle = c("car", "boat", "car", "car", "boat"))
mydf
# A tibble: 5 × 3
   NUMS lets  vehicle
  <int> <chr> <chr>  
1     1 a     car    
2     2 b     boat   
3     3 c     car    
4     4 d     car    
5     5 e     boat   

Renaming columns

We can use the names function to set that first column to lowercase:

names(mydf)[1]<- ("nums")
mydf
# A tibble: 5 × 3
   nums lets  vehicle
  <int> <chr> <chr>  
1     1 a     car    
2     2 b     boat   
3     3 c     car    
4     4 d     car    
5     5 e     boat   

Your Turn

  1. Make a data frame with column 1: 1,2,3,4,5,6 and column 2: a,b,a,b,a,b

  2. Select only rows with value "a" in column 2 using logical vector

  3. Rename both column 1 and column 2 to something of your choosing.

  4. mtcars is a built-in data set like iris: Extract the 4th row of the mtcars data.

  5. Create another column in the flower data frame, that is the sum of Sepal Width, Sepal Length, Petal Width and Petal Length.

  • Hint: use “+” instead of sum function (gives total sum). Better ways to do this using dplyr package, but just trying to practice adding new columns right now
  1. Step Further. Create another column in the flower data frame, that is the sum of Sepal Width, Sepal Length, Petal Width and Petal Length, where Sepal Length is greater than 6.

Lists

  • Lists are a structured collection of R objects

  • R objects in a list need not be the same type

  • Create lists using the list function

  • Lists indexed using double square brackets [[ ]] to select an object

  • Use single square brackets to select two or more list elements. e.g. [c(2,4)]

  • For named lists, can select a list element with $ like data frames

List Example

Creating a list containing a vector and a matrix:

mylist <- list(matrix(letters[1:10], nrow = 2, ncol = 5),
               seq(0, 49, by = 7))
mylist
[[1]]
     [,1] [,2] [,3] [,4] [,5]
[1,] "a"  "c"  "e"  "g"  "i" 
[2,] "b"  "d"  "f"  "h"  "j" 

[[2]]
[1]  0  7 14 21 28 35 42 49

Use indexing to select the second list element:

mylist[[2]]
[1]  0  7 14 21 28 35 42 49

Your Turn

  1. Create a list containing a vector and a 2x3 data frame

  2. Use indexing to select the data frame from your list

  3. Use further indexing to select the first row from the data frame in your list

Examining Objects

  • head(x) - View top 6 rows of a data frame

  • tail(x) - View bottom 6 rows of a data frame

  • summary(x) - Summary statistics

  • str(x) - View structure of object

  • dim(x) - View dimensions of object

  • length(x) - Returns the length of a vector

Examining Objects Example

We can examine the first two values of an object by passing the n parameter to the head function:

head(iris, n = 2)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa

What’s its structure?

str(iris)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Your Turn

  1. View the top 8 rows of mtcars data

  2. What type of object is the mtcars data set?

  3. How many rows are in iris data set? (Try finding this using dim or indexing + length)

  4. Summarize the values in each column in iris data set

Working with Output from a Function

  • The output from a function can be saved as an object

  • The object can be any type (data frame, vector, etc.) but is often a list object

  • Items from the output can be extracted for further computing

  • The output object can be examined using functions like str(x)

Saving Output Demo

  • t-test using iris data to see if petal lengths for setosa and versicolor are the same

  • t.test function can only handle two groups, so we subset out the virginica species

t.test(Petal.Length ~ Species, data = iris[iris$Species != "virginica", ])

    Welch Two Sample t-test

data:  Petal.Length by Species
t = -39.493, df = 62.14, p-value < 2.2e-16
alternative hypothesis: true difference in means between group setosa and group versicolor is not equal to 0
95 percent confidence interval:
 -2.939618 -2.656382
sample estimates:
    mean in group setosa mean in group versicolor 
                   1.462                    4.260 

Demo (Continued)

Save the output of the t-test to an object

tout <- t.test(Petal.Length ~ Species, data = iris[iris$Species != "virginica", ])

Let’s look at the structure of this object:

str(tout)
List of 10
 $ statistic  : Named num -39.5
  ..- attr(*, "names")= chr "t"
 $ parameter  : Named num 62.1
  ..- attr(*, "names")= chr "df"
 $ p.value    : num 9.93e-46
 $ conf.int   : num [1:2] -2.94 -2.66
  ..- attr(*, "conf.level")= num 0.95
 $ estimate   : Named num [1:2] 1.46 4.26
  ..- attr(*, "names")= chr [1:2] "mean in group setosa" "mean in group versicolor"
 $ null.value : Named num 0
  ..- attr(*, "names")= chr "difference in means between group setosa and group versicolor"
 $ stderr     : num 0.0708
 $ alternative: chr "two.sided"
 $ method     : chr "Welch Two Sample t-test"
 $ data.name  : chr "Petal.Length by Species"
 - attr(*, "class")= chr "htest"

Demo: Extracting the P-Value

Since this is simply a list, we can use our regular indexing:

#pvalue
tout$p.value
[1] 9.934433e-46
tout[[3]]
[1] 9.934433e-46

Importing Data

We often need to import in our own data rather than just using built-in datasets.

  • First need to find where you have your file saved.

    • Think back to our discussion about Working Directories and R Studio Projects in the Previous Slides
  • Data read in using R functions such as:

    • read.table() for reading in .txt files

    • read.csv() for reading in .csv files

    • read_excel() from the readxl package for .xlsx files

  • Assign the data to new R object when reading in the file

Importing Data Demo

We first create a csv file (We can use a text editor or MS Excel)

Then we load it in:

# df = iris[1:10, 1:5]
# write.csv(df, 'tips.csv')

littledata <- read.csv("tips.csv", header = TRUE)

head(littledata)

I like to use head() to make sure my data read in how I thought it should .

Your Turn

  • Make 5 rows of data in an excel spreadsheet and save it as a tab-delimited txt file. (or use yourturndata.txt on the website)

  • Import this new .txt file into R with read_table. You may need to look at the help page for read_table in order to properly do this.

  • If want to try a csv file, try reading in the tips.csv file yourself.