A matrix is a rectangular arrangement of numbers in rows and columns
There are some functions in R that will require your data to be arranged as a matrix.
# Creating a matrix (3x3)row_matrix <-matrix(# Taking sequence of elements c(1, 2, 3, 4, 5, 6, 7, 8, 9),# Number of rowsnrow =3, # No. of columnsncol =3, # By default matrices are in column-wise order# So this parameter decides how to arrange the matrixbyrow =TRUE)row_matrix
# A tibble: 5 × 3
NUMS lets vehicle
<int> <chr> <chr>
1 1 a car
2 2 b boat
3 3 c car
4 4 d car
5 5 e boat
Renaming columns
We can use the names function to set that first column to lowercase:
names(mydf)[1]<- ("nums")mydf
# A tibble: 5 × 3
nums lets vehicle
<int> <chr> <chr>
1 1 a car
2 2 b boat
3 3 c car
4 4 d car
5 5 e boat
Your Turn
Make a data frame with column 1: 1,2,3,4,5,6 and column 2: a,b,a,b,a,b
Select only rows with value "a" in column 2 using logical vector
Rename both column 1 and column 2 to something of your choosing.
mtcars is a built-in data set like iris: Extract the 4th row of the mtcars data.
Create another column in the flower data frame, that is the sum of Sepal Width, Sepal Length, Petal Width and Petal Length.
Hint: use “+” instead of sum function (gives total sum). Better ways to do this using dplyr package, but just trying to practice adding new columns right now
Step Further. Create another column in the flower data frame, that is the sum of Sepal Width, Sepal Length, Petal Width and Petal Length, where Sepal Length is greater than 6.
Lists
Lists are a structured collection of R objects
R objects in a list need not be the same type
Create lists using the list function
Lists indexed using double square brackets [[ ]] to select an object
Use single square brackets to select two or more list elements. e.g. [c(2,4)]
For named lists, can select a list element with $ like data frames
List Example
Creating a list containing a vector and a matrix:
mylist <-list(matrix(letters[1:10], nrow =2, ncol =5),seq(0, 49, by =7))mylist
How many rows are in iris data set? (Try finding this using dim or indexing + length)
Summarize the values in each column in iris data set
Working with Output from a Function
The output from a function can be saved as an object
The object can be any type (data frame, vector, etc.) but is often a list object
Items from the output can be extracted for further computing
The output object can be examined using functions like str(x)
Saving Output Demo
t-test using iris data to see if petal lengths for setosa and versicolor are the same
t.test function can only handle two groups, so we subset out the virginica species
t.test(Petal.Length ~ Species, data = iris[iris$Species !="virginica", ])
Welch Two Sample t-test
data: Petal.Length by Species
t = -39.493, df = 62.14, p-value < 2.2e-16
alternative hypothesis: true difference in means between group setosa and group versicolor is not equal to 0
95 percent confidence interval:
-2.939618 -2.656382
sample estimates:
mean in group setosa mean in group versicolor
1.462 4.260
Demo (Continued)
Save the output of the t-test to an object
tout <-t.test(Petal.Length ~ Species, data = iris[iris$Species !="virginica", ])
Let’s look at the structure of this object:
str(tout)
List of 10
$ statistic : Named num -39.5
..- attr(*, "names")= chr "t"
$ parameter : Named num 62.1
..- attr(*, "names")= chr "df"
$ p.value : num 9.93e-46
$ conf.int : num [1:2] -2.94 -2.66
..- attr(*, "conf.level")= num 0.95
$ estimate : Named num [1:2] 1.46 4.26
..- attr(*, "names")= chr [1:2] "mean in group setosa" "mean in group versicolor"
$ null.value : Named num 0
..- attr(*, "names")= chr "difference in means between group setosa and group versicolor"
$ stderr : num 0.0708
$ alternative: chr "two.sided"
$ method : chr "Welch Two Sample t-test"
$ data.name : chr "Petal.Length by Species"
- attr(*, "class")= chr "htest"
Demo: Extracting the P-Value
Since this is simply a list, we can use our regular indexing:
#pvaluetout$p.value
[1] 9.934433e-46
tout[[3]]
[1] 9.934433e-46
Importing Data
We often need to import in our own data rather than just using built-in datasets.
First need to find where you have your file saved.
Think back to our discussion about Working Directories and R Studio Projects in the Previous Slides
Data read in using R functions such as:
read.table() for reading in .txt files
read.csv() for reading in .csv files
read_excel() from the readxl package for .xlsx files
Assign the data to new R object when reading in the file
Importing Data Demo
We first create a csv file (We can use a text editor or MS Excel)