Commonly used R functions are installed with base R
R packages containing more specialized R functions can be installed freely from CRAN servers using function install.packages()
After packages are installed, their functions can be loaded into the current R session using the function library()
How do I locate a package with the desired function?
Google (“R project” + search term works well)
R website task views to search relevant subjects: http://cran.r-project.org/web/views/
??searchterm will search R help for pages related to the search term
ggplot2
: Statistical graphics
tidyverse
: Manipulating data structures (includes dplyr
, tidyr
, purr
, tibble
, etc packages)
lme4
: Mixed models
knitr
: integrate LaTeX, HTML, or Markdown with R for easy reproducible research
vegan
: Ordination methods, diversity analysis and other functions for community and vegetation ecologists.
phyloseq
: Handling and analysis of high-throughput microbiome census data
ggtree
: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data
caret
: The R equivalent to scikit learn: train / test split, cross validation, model performance metrics
List of some Handy R Packages - Definitely NOT comprehensive
Code Skeleton:
Example:
mymean <- function(data) {
ans <- sum(data) / length(data)
return(ans)
}
set.seed(4) #to generate the same random numbers each time run sample function
example <- sample(1:200, 15) #return 15 random values between 1 and 200 without replacement. without set.seed, will return different values each time
mymean(example) #mean of random generated data
[1] 110.7333
mymean
data
(data you want to calculate mean of)ans <- sum(data) / length(data)
(code you run to get desired output)ans
(the mean of the data)Skeleton:
Basic Example:
Example within a Function:
Reducing the amount of typing we do can be nice
If we have a lot of code that is essentially the same we can take advantage of looping.
R offers several loops: for
, while
, repeat
.
The test_expression is i < 5 which is TRUE since 1 is less than 5. So, the body of the loop is entered and i is printed and incremented (i + 1)
This will continue until i takes the value 5. The condition 5 < 5 will give FALSE and the while loop exits.
Make sure the loop will eventually meet the condition, otherwise the loop will run infinitely.
Create a function that takes numeric input and provides the mean and a 95% confidence interval for the mean for the data (the t.test function could be useful - example in 4-DataStructures.R - will give you your confidence interval)
Add checks to your function to make sure the data is either numeric or logical. If it is logical convert it to numeric.
The diamonds data set is included in the ggplot2
package and is well known as a convenient data set for examples. It can be read into your environment with the function data("diamonds", package = "ggplot2")
. Loop over the columns of the diamonds data set and apply your function to all of the numeric columns.