One Way ANOVA
Two-way ANOVA
Qualitative/Qualitative
Quantitative/Qualitative
Blocking
A plant pathology researcher is studying the effect of an experimental fungicide on the fresh weight of azaleas inoculated with a certain fungus. She has 49 potted plants of a particular azalea variety growing in a greenhouse. All plants are inoculated with a fungus. To start, she has 7 different fungicide combinations, with 7 plants per treatment.(Example from Lentner and Bishop, Experimental Design and Analysis (2nd))
Note: As we go along, we will continue to use this scenario; we will just continue to get more specific on what the treatments are. The dataset will be updated accordingly as well.
Essentially an expansion of the t-test Source
Compare more than 2 levels of a treatment factor
Treatment factor is your independent variable, the variable you expect to possibly cause the variation we see in the response. (i.e. the fungicide combination)
Levels are the subdivisions of your overall treatment factor. (i.e. the 7 different combinations)
If you only have two levels of a factor, a One-Way ANOVA would be equivalent to t-test.
Randomly sample your experimental units, and then randomly assign them to the different levels of the treatment factor.
\[y_{ij} = \mu + \tau_i + \varepsilon_{ij}\]
In terms of our example:
\[H_0: \mu_1 = \mu_2 = ... = \mu_n\] \[H_A: \text{at least one } \mu_i \text{ is different}\]
OR
\[H_0: \tau_1 = \tau_2 = ... = \tau_n = 0\] \[H_A: \text{at least one } \tau_i \ne 0\]
These hypotheses are equivalent, just being expressed in two different ways.
library(car)
#mod.fit <- lm(response ~ trt, data = data)
mod.fit <- lm(Weight ~ Trt, data = azalea)
Anova(mod.fit, type = "III")
Anova Table (Type III tests)
Response: Weight
Sum Sq Df F value Pr(>F)
(Intercept) 14014.7 1 387.6348 <2e-16 ***
Trt 11.3 1 0.3117 0.5793
Residuals 1699.3 47
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Trt
to a factor!library(car)
#mod.fit <- lm(response ~ trt, data = data)
azalea$Trt.new <- as.factor(azalea$Trt)
mod.fit <- lm(Weight ~ Trt.new, data = azalea)
Anova(mod.fit, type = "III")
Anova Table (Type III tests)
Response: Weight
Sum Sq Df F value Pr(>F)
(Intercept) 9731.6 1 297.7812 <2e-16 ***
Trt.new 338.0 6 1.7236 0.1392
Residuals 1372.6 42
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
emmeans
emmeans
results$emmeans
Trt.new emmean SE df lower.CL upper.CL
1 37.3 2.16 42 32.9 41.6
2 38.3 2.16 42 33.9 42.6
3 38.7 2.16 42 34.4 43.1
4 43.4 2.16 42 39.1 47.8
5 35.1 2.16 42 30.8 39.5
6 37.0 2.16 42 32.6 41.4
7 41.6 2.16 42 37.2 45.9
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
Trt.new1 - Trt.new2 -1.000 3.06 42 -0.327 0.9999
Trt.new1 - Trt.new3 -1.429 3.06 42 -0.468 0.9991
Trt.new1 - Trt.new4 -6.143 3.06 42 -2.010 0.4236
Trt.new1 - Trt.new5 2.143 3.06 42 0.701 0.9918
Trt.new1 - Trt.new6 0.286 3.06 42 0.094 1.0000
Trt.new1 - Trt.new7 -4.286 3.06 42 -1.403 0.7974
Trt.new2 - Trt.new3 -0.429 3.06 42 -0.140 1.0000
Trt.new2 - Trt.new4 -5.143 3.06 42 -1.683 0.6307
Trt.new2 - Trt.new5 3.143 3.06 42 1.029 0.9444
Trt.new2 - Trt.new6 1.286 3.06 42 0.421 0.9995
Trt.new2 - Trt.new7 -3.286 3.06 42 -1.075 0.9319
Trt.new3 - Trt.new4 -4.714 3.06 42 -1.543 0.7179
Trt.new3 - Trt.new5 3.571 3.06 42 1.169 0.9018
Trt.new3 - Trt.new6 1.714 3.06 42 0.561 0.9976
Trt.new3 - Trt.new7 -2.857 3.06 42 -0.935 0.9645
Trt.new4 - Trt.new5 8.286 3.06 42 2.712 0.1204
Trt.new4 - Trt.new6 6.429 3.06 42 2.104 0.3693
Trt.new4 - Trt.new7 1.857 3.06 42 0.608 0.9962
Trt.new5 - Trt.new6 -1.857 3.06 42 -0.608 0.9962
Trt.new5 - Trt.new7 -6.429 3.06 42 -2.104 0.3693
Trt.new6 - Trt.new7 -4.571 3.06 42 -1.496 0.7455
P value adjustment: tukey method for comparing a family of 7 estimates
Suppose you find out from the researchers that are actually only two fungicides used. Treatment 1 is a standard fungicide, Truban (T), and is used as a control in the experiment. Treatments 2-7 are the experimental fungicide, Nurell (N), that is applied at different times and rates. The researchers want to compare T and N to see if there is a difference between these two fungicide treatments.
1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|
T | N | N | N | N | N | N |
\(\mu_1\) | \(\mu_2\) | \(\mu_3\) | \(\mu_4\) | \(\mu_5\) | \(\mu_6\) | \(\mu_7\) |
\[H_0: \mu_1 = \frac{\mu_2 + \mu_3 + \mu_4 + \mu_5 + \mu_6 + \mu_7}{6}\]
\[H_A: \mu_1 \ne \frac{\mu_2 + \mu_3 + \mu_4 + \mu_5 + \mu_6 + \mu_7}{6}\]
Estimate
Trt.new c=( 1 -0.166666666666667 -0.166666666666667 -0.166666666666667 -0.166666666666667 -0.166666666666667 -0.166666666666667 ) -1.738095
Std. Error
Trt.new c=( 1 -0.166666666666667 -0.166666666666667 -0.166666666666667 -0.166666666666667 -0.166666666666667 -0.166666666666667 ) 2.333819
t value
Trt.new c=( 1 -0.166666666666667 -0.166666666666667 -0.166666666666667 -0.166666666666667 -0.166666666666667 -0.166666666666667 ) -0.7447429
Pr(>|t|)
Trt.new c=( 1 -0.166666666666667 -0.166666666666667 -0.166666666666667 -0.166666666666667 -0.166666666666667 -0.166666666666667 ) 0.4605743
attr(,"class")
[1] "fit_contrast"
Estimate
Trt.new c=( 6 -1 -1 -1 -1 -1 -1 ) -10.42857
Std. Error
Trt.new c=( 6 -1 -1 -1 -1 -1 -1 ) 14.00292
t value
Trt.new c=( 6 -1 -1 -1 -1 -1 -1 ) -0.7447429
Pr(>|t|)
Trt.new c=( 6 -1 -1 -1 -1 -1 -1 ) 0.4605743
attr(,"class")
[1] "fit_contrast"
Now that the researchers know there isn’t a difference between the T and N fungicides, they want to know more about the N fungicide specifically. You learn that this fungicide was applied either 3 days before inoculation or 7 days after inoculation. Additionally, it is applied at either the 1 oz., 3 oz., or 5 oz. dose rates.
N | N | N | N | N | N |
---|---|---|---|---|---|
3 days before | 3 days before | 3 days before | 7 days after | 7 days after | 7 days after |
1 oz. | 3 oz. | 5 oz. | 1 oz. | 3 oz. | 5 oz. |
Now, instead of a single treatment factor, we have 2! We have time of application and rate of application. Thus, a One-Way ANOVA will no longer suffice.
Extension of One-Way ANOVA, we just now have two treatment factors instead of one.
When dealing with more than one treatment factor, our treatment design is now a factorial.
You have an \(i \times j\) factorial
We are interested in knowing if the two treatment factors interact in their effect on the response.
\[y_{ijk} = \mu + \alpha_i + \beta_j + \alpha\beta_{ij} + \varepsilon_{ijk}\]
In terms of our example:
\[H_0: \text{There is no interaction.}\] \[H_A: \text{There is an interaction.}\]
If we reject the null hypothesis and conclude there is an interaction, we proceed by looking at the simple effects.
If we fail to reject the null hypothesis and conclude there is no evidence of an interaction, we proceed by looking at the main effects.
azalea2$AppTime.new <- as.factor(azalea2$AppTime)
azalea2$AppRate.new <- as.factor(azalea2$AppRate)
mod.fit2 <- lm(Weight ~ AppTime.new + AppRate.new + AppTime.new:AppRate.new, data = azalea2)
Anova(mod.fit2, type = "III")
Anova Table (Type III tests)
Response: Weight
Sum Sq Df F value
(Intercept) 10260.6 1 291.9675
AppTime.new 34.6 1 0.9837
AppRate.new 114.0 2 1.6220
AppTime.new:AppRate.new 4.3 2 0.0617
Residuals 1265.1 36
Pr(>F)
(Intercept) <2e-16 ***
AppTime.new 0.3279
AppRate.new 0.2116
AppTime.new:AppRate.new 0.9403
Residuals
---
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
emmeans
.$emmeans
AppRate.new emmean SE df lower.CL upper.CL
1 36.7 1.58 36 33.5 39.9
3 37.9 1.58 36 34.6 41.1
5 42.5 1.58 36 39.3 45.7
Results are averaged over the levels of: AppTime.new
Confidence level used: 0.95
$contrasts
contrast estimate SE df
AppRate.new1 - AppRate.new3 -1.14 2.24 36
AppRate.new1 - AppRate.new5 -5.79 2.24 36
AppRate.new3 - AppRate.new5 -4.64 2.24 36
t.ratio p.value
-0.510 0.8669
-2.582 0.0365
-2.072 0.1100
Results are averaged over the levels of: AppTime.new
P value adjustment: tukey method for comparing a family of 3 estimates
Still use the emmeans
function!
In the pairwise
part of the argument, use the vertical bar, |, to denote which treatment factor you want to keep constant.
$emmeans
AppTime.new = 1:
AppRate.new emmean SE df lower.CL upper.CL
1 38.3 2.24 36 33.7 42.8
3 38.7 2.24 36 34.2 43.3
5 43.4 2.24 36 38.9 48.0
AppTime.new = 2:
AppRate.new emmean SE df lower.CL upper.CL
1 35.1 2.24 36 30.6 39.7
3 37.0 2.24 36 32.5 41.5
5 41.6 2.24 36 37.0 46.1
Confidence level used: 0.95
$contrasts
AppTime.new = 1:
contrast estimate SE df
AppRate.new1 - AppRate.new3 -0.429 3.17 36
AppRate.new1 - AppRate.new5 -5.143 3.17 36
AppRate.new3 - AppRate.new5 -4.714 3.17 36
t.ratio p.value
-0.135 0.9900
-1.623 0.2492
-1.488 0.3086
AppTime.new = 2:
contrast estimate SE df
AppRate.new1 - AppRate.new3 -1.857 3.17 36
AppRate.new1 - AppRate.new5 -6.429 3.17 36
AppRate.new3 - AppRate.new5 -4.571 3.17 36
t.ratio p.value
-0.586 0.8284
-2.029 0.1198
-1.443 0.3303
P value adjustment: tukey method for comparing a family of 3 estimates
$emmeans
AppRate.new = 1:
AppTime.new emmean SE df lower.CL upper.CL
1 38.3 2.24 36 33.7 42.8
2 35.1 2.24 36 30.6 39.7
AppRate.new = 3:
AppTime.new emmean SE df lower.CL upper.CL
1 38.7 2.24 36 34.2 43.3
2 37.0 2.24 36 32.5 41.5
AppRate.new = 5:
AppTime.new emmean SE df lower.CL upper.CL
1 43.4 2.24 36 38.9 48.0
2 41.6 2.24 36 37.0 46.1
Confidence level used: 0.95
$contrasts
AppRate.new = 1:
contrast estimate SE df
AppTime.new1 - AppTime.new2 3.14 3.17 36
t.ratio p.value
0.992 0.3279
AppRate.new = 3:
contrast estimate SE df
AppTime.new1 - AppTime.new2 1.71 3.17 36
t.ratio p.value
0.541 0.5918
AppRate.new = 5:
contrast estimate SE df
AppTime.new1 - AppTime.new2 1.86 3.17 36
t.ratio p.value
0.586 0.5615
In the previous analysis, we treated application rate as a qualitative/categorical variable, meaning the researchers are only interested in the levels of 1 oz., 3 oz., and 5 oz. You learn the researchers actually want to be able to predict the fresh weight of azaleas that were given N at the 2 oz. rate.
Before, we were analyzing a qualitative/qualitative factorial, where both treatment factors are qualitative.
Now, we want to change the analysis to a qualitative/quantitative factorial.
Time of application remains a qualitative variable.
Application rate is updated to a quantitative.
We are now adding in regression to our factorial analysis.
Want to know if the same line can be fit for the two levels time of application
In other words, does the time of application effect the prediction of fresh weight of azaleas for the rate of application?
In other, other words, is there an interaction?
Our hypotheses are very similar to what they were earlier. We just now need to add in a test of shape for the regression lines.
To determine the maximum power of your regression line, take the number of levels of your quantitative factor and subtract 1.
So for our example, we have 3 levels. Thus, we can fit up to a maximum power of 2 (a quadratic).
mod.fit3 <- aov(Weight ~ AppTime.new + AppRate.new + AppTime.new:AppRate.new, data = azalea2)
summary(mod.fit3, split=list(AppRate.new = list(Linear=1, Quad=2)))
Df Sum Sq
AppTime.new 1 52.6
AppRate.new 2 262.9
AppRate.new: Linear 1 28.6
AppRate.new: Quad 1 234.3
AppTime.new:AppRate.new 2 4.3
AppTime.new:AppRate.new: Linear 1 1.4
AppTime.new:AppRate.new: Quad 1 2.9
Residuals 36 1265.1
Mean Sq F value
AppTime.new 52.60 1.497
AppRate.new 131.45 3.741
AppRate.new: Linear 28.58 0.813
AppRate.new: Quad 234.32 6.668
AppTime.new:AppRate.new 2.17 0.062
AppTime.new:AppRate.new: Linear 1.44 0.041
AppTime.new:AppRate.new: Quad 2.89 0.082
Residuals 35.14
Pr(>F)
AppTime.new 0.2291
AppRate.new 0.0334 *
AppRate.new: Linear 0.3731
AppRate.new: Quad 0.0140 *
AppTime.new:AppRate.new 0.9403
AppTime.new:AppRate.new: Linear 0.8407
AppTime.new:AppRate.new: Quad 0.7758
Residuals
---
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = Weight ~ AppRate + I(AppRate^2), data = azalea2)
Residuals:
Min 1Q Median 3Q Max
-13.7143 -3.3393 0.3929 4.0357 9.5000
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.4554 3.5548 10.537 5.7e-13
AppRate -1.1786 2.9112 -0.405 0.688
I(AppRate^2) 0.4375 0.4764 0.918 0.364
(Intercept) ***
AppRate
I(AppRate^2)
---
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.822 on 39 degrees of freedom
Multiple R-squared: 0.1659, Adjusted R-squared: 0.1231
F-statistic: 3.878 on 2 and 39 DF, p-value: 0.02911
For 3 days before: \(y = 39.0565 - 1.3393*rate + 0.4375*rate^2\)
For 7 days after: \(y = 35.8541 - 1.0179*rate + 0.4375*rate^2\)
mod.fit5 <- lm(Weight ~ AppTime.new + AppRate + I(AppRate^2) + AppTime.new:AppRate, data = azalea2)
summary(mod.fit5)
Call:
lm(formula = Weight ~ AppTime.new + AppRate + I(AppRate^2) +
AppTime.new:AppRate, data = azalea2)
Residuals:
Min 1Q Median 3Q Max
-12.2738 -4.0119 0.2798 4.4702 9.7262
Coefficients:
Estimate Std. Error t value
(Intercept) 39.0565 4.0406 9.666
AppTime.new2 -3.2024 3.7767 -0.848
AppRate -1.3393 2.9772 -0.450
I(AppRate^2) 0.4375 0.4788 0.914
AppTime.new2:AppRate 0.3214 1.1057 0.291
Pr(>|t|)
(Intercept) 1.15e-11 ***
AppTime.new2 0.402
AppRate 0.655
I(AppRate^2) 0.367
AppTime.new2:AppRate 0.773
---
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.851 on 37 degrees of freedom
Multiple R-squared: 0.2009, Adjusted R-squared: 0.1145
F-statistic: 2.325 on 4 and 37 DF, p-value: 0.07446
You learn the researchers actually 7 plots, and they made sure all six treatment combinations of the N fungicide were included. The researchers completed the experiment in a Randomized Complete Block Design! (Note: We are also going to go back to the assumption it is a qualitative/qualitative analysis.)
A block allows us to group experimental units together such that the variance expected within the block is a lot smaller than the variance between blocks.
We typically assume blocks are a random sample from the population, and thus, our sample of blocks is representative of the population of similar constructed blocks.
\[y_{ij} = \mu + \tau_i + b_j + \varepsilon_{ij}\]
\(b_j \sim N(0,\sigma_b^2)\) is the effect of the \(j^{th}\) block
The rest of the aspects of the model are basically the same as before. Description of subscripts changes.
Two-Way ANOVA
\[y_{ijk} = \mu + \alpha_i + \beta_j + \alpha\beta_{ij} + b_k + \varepsilon_{ijk}\]
azalea3$AppTime.new <- as.factor(azalea3$AppTime)
azalea3$AppRate.new <- as.factor(azalea3$AppRate)
azalea3$Block.new <- as.factor(azalea3$Block)
library(lme4)
mod.fit6 <- lmer(Weight ~ AppTime.new + AppRate.new + AppTime.new:AppRate.new + (1|Block.new), data = azalea3)
Anova(mod.fit3, type = "III")
Anova Table (Type III tests)
Response: Weight
Sum Sq Df F value
(Intercept) 10260.6 1 291.9675
AppTime.new 34.6 1 0.9837
AppRate.new 114.0 2 1.6220
AppTime.new:AppRate.new 4.3 2 0.0617
Residuals 1265.1 36
Pr(>F)
(Intercept) <2e-16 ***
AppTime.new 0.3279
AppRate.new 0.2116
AppTime.new:AppRate.new 0.9403
Residuals
---
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1