ANOVA in R: A stepbystep guide
ANOVA is a statistical test for estimating how a quantitative dependent variable changes according to the levels of one or more categorical independent variables. ANOVA tests whether there is a difference in means of the groups at each level of the independent variable.
The null hypothesis (H_{0}) of the ANOVA is no difference in means, and the alternate hypothesis (H_{a}) is that the means are different from one another.
In this guide, we will walk you through the process of a oneway ANOVA (one independent variable) and a twoway ANOVA (two independent variables).
Our sample dataset contains observations from an imaginary study of the effects of fertilizer type and planting density on crop yield.
We will also include examples of how to perform and interpret a twoway ANOVA with an interaction term, and an ANOVA with a blocking variable.
Getting started in R
If you haven’t used R before, start by downloading R and R Studio. Once you have both of these programs downloaded, open R Studio and click on File > New File > R Script.
Now you can copy and paste the code from the rest of this example into your script. To run the code, highlight the lines you want to run and click on the Run button on the top right of the text editor (or press ctrl + enter on the keyboard).
Install and load the packages
First, install the packages you will need for the analysis (this only needs to be done once):
Then load these packages into your R environment (do this every time you restart the R program):
Step 1: Load the data into R
Note that this data was generated for this example, it’s not from a real experiment!
We will use the same dataset for all of our examples in this walkthrough. The only difference between the different analyses is how many independent variables we include and in what combination we include them.
It is common for factors to be read as quantitative variables when importing a dataset into R. To avoid this, you can use the read.csv()
command to read in the data, specifying within the command whether each of the variables should be quantitative (“numeric”) or categorical (“factor”).
Use the following code, replacing the path/to/your/file text with the actual path to your file:
Before continuing, you can check that the data has read in correctly:
You should see ‘density’, ‘block’, and ‘fertilizer’ listed as categorical variables with the number of observations at each level (i.e. 48 observations at density 1 and 48 observations at density 2).
‘Yield’ should be a quantitative variable with a numeric summary (minimum, median, mean, maximum).
Step 2: Perform the ANOVA test
ANOVA tests whether any of the group means are different from the overall mean of the data by checking the variance of each individual group against the overall variance of the data. If one or more groups falls outside the range of variation predicted by the null hypothesis (all group means are equal), then the test is statistically significant.
We can perform an ANOVA in R using the aov()
function. This will calculate the test statistic for ANOVA and determine whether there is significant variation among the groups formed by the levels of the independent variable.
Oneway ANOVA
In the oneway ANOVA example, we are modeling crop yield as a function of the type of fertilizer used. First we will use aov()
to run the model, then we will use summary()
to print the summary of the model.
The model summary first lists the independent variables being tested in the model (in this case we have only one, ‘fertilizer’) and the model residuals (‘Residual’). All of the variation that is not explained by the independent variables is called residual variance.
The rest of the values in the output table describe the independent variable and the residuals:
 The Df column displays the degrees of freedom for the independent variable (the number of levels in the variable minus 1), and the degrees of freedom for the residuals (the total number of observations minus one and minus the number of levels in the independent variables).
 The Sum Sq column displays the sum of squares (a.k.a. the total variation between the group means and the overall mean).
 The Mean Sq column is the mean of the sum of squares, calculated by dividing the sum of squares by the degrees of freedom for each parameter.
 The Fvalue column is the test statistic from the F test. This is the mean square of each independent variable divided by the mean square of the residuals. The larger the F value, the more likely it is that the variation caused by the independent variable is real and not due to chance.
 The Pr(>F) column is the pvalue of the Fstatistic. This shows how likely it is that the Fvalue calculated from the test would have occurred if the null hypothesis of no difference among group means were true.
The pvalue of the fertilizer variable is low (p < 0.001), so it appears that the type of fertilizer used has a real impact on the final crop yield.
Twoway ANOVA
In the twoway ANOVA example, we are modeling crop yield as a function of type of fertilizer and planting density. First we use aov()
to run the model, then we use summary()
to print the summary of the model.
Adding planting density to the model seems to have made the model better: it reduced the residual variance (the residual sum of squares went from 35.89 to 30.765), and both planting density and fertilizer are statistically significant (pvalues < 0.001).
Adding interactions between variables
Sometimes you have reason to think that two of your independent variables have an interaction effect rather than an additive effect.
For example, in our crop yield experiment, it is possible that planting density affects the plants’ ability to take up fertilizer. This might influence the effect of fertilizer type in a way that isn’t accounted for in the twoway model.
To test whether two variables have an interaction effect in ANOVA, simply use an asterisk instead of a plussign in the model:
In the output table, the ‘fertilizer:density’ variable has a low sumofsquares value and a high pvalue, which means there is not much variation that can be explained by the interaction between fertilizer and planting density.
Adding a blocking variable
If you have grouped your experimental treatments in some way, or if you have a confounding variable that might affect the relationship you are interested in testing, you should include that element in the model as a blocking variable. The simplest way to do this is just to add the variable into the model with a ‘+’.
For example, in many crop yield studies, treatments are applied within ‘blocks’ in the field that may differ in soil texture, moisture, sunlight, etc. To control for the effect of differences among planting blocks we add a third term, ‘block’, to our ANOVA.
The ‘block’ variable has a low sumofsquares value (0.486) and a high pvalue (p = 0.48), so it’s probably not adding much information to the model. It also doesn’t change the sum of squares for the two independent variables, which means that it’s not affecting how much variation in the dependent variable they explain.
Step 3: Find the bestfit model
There are now four different ANOVA models to explain the data. How do you decide which one to use? Usually you’ll want to use the ‘bestfit’ model – the model that best explains the variation in the dependent variable.
The Akaike information criterion (AIC) is a good test for model fit. AIC calculates the information value of each model by balancing the variation explained against the number of parameters used.
In AIC model selection, we compare the information value of each model and choose the one with the lowest AIC value (a lower number means more information explained!)
The model with the lowest AIC score (listed first in the table) is the best fit for the data:
From these results, it appears that the two.way model is the best fit. The twoway model has the lowest AIC value, and 71% of the AIC weight, which means that it explains 71% of the total variation in the dependent variable that can be explained by the full set of models.
The model with blocking term contains an additional 15% of the AIC weight, but because it is more than 2 deltaAIC worse than the best model, it probably isn’t good enough to include in your results.
Step 4: Check for homoscedasticity
To check whether the model fits the assumption of homoscedasticity, look at the model diagnostic plots in R using the plot()
function:
The output looks like this:
The diagnostic plots show the unexplained variance (residuals) across the range of the observed data.
Each plot gives a specific piece of information about the model fit, but it’s enough to know that the red line representing the mean of the residuals should be horizontal and centered on zero (or on one, in the scalelocation plot), meaning that there are no large outliers that would cause bias in the model.
The normal QQ plot plots a regression between the theoretical residuals of a perfectlyheteroscedastic model and the actual residuals of your model, so the closer to a slope of 1 this is the better. This QQ plot is very close, with only a bit of deviation.
From these diagnostic plots we can say that the model fits the assumption of heteroscedasticity.
If your model doesn’t fit the assumption of homoscedasticity, you can try the KruskallWallis test instead.
Step 5: Do a posthoc test
ANOVA tells us if there are differences among group means, but not what the differences are. To find out which groups are statistically different from one another, you can perform a Tukey’s Honestly Significant Difference (Tukey’s HSD) posthoc test for pairwise comparisons:
From the posthoc test results, we see that there are statistically significant differences (p < 0.05) between fertilizer groups 3 and 1 and between fertilizer types 3 and 2, but the difference between fertilizer groups 2 and 1 is not statistically significant. There is also a significant difference between the two different levels of planting density.
Step 6: Plot the results in a graph
When plotting the results of a model, it is important to display:
 the raw data
 summary information, usually the mean and standard error of each group being compared
 letters or symbols above each group being compared to indicate the groupwise differences.
Find the groupwise differences
From the ANOVA test we know that both planting density and fertilizer type are significant variables. To display this information on a graph, we need to show which of the combinations of fertilizer type + planting density are statistically different from one another.
To do this, we can run another ANOVA + TukeyHSD test, this time using the interaction of fertilizer and planting density. We aren’t doing this to find out if the interaction term is significant (we already know it’s not), but rather to find out which group means are statistically different from one another so we can add this information to the graph.
Instead of printing the TukeyHSD results in a table, we’ll do it in a graph.
The significant groupwise differences are any where the 95% confidence interval doesn’t include zero. This is another way of saying that the pvalue for these pairwise differences is < 0.05.
From this graph, we can see that the fertilizer + planting density combinations which are significantly different from one another are 3:11:1 (read as “fertilizer type three + planting density 1 contrasted with fertilizer type 1 + planting density type 1”), 1:21:1, 2:21:1, 3:21:1, and 3:22:1.
To summarize, we have 3 distinct groupings. Fertilizer 3, planting density 2 is different from all of the other combinations, as is fertilizer 1, planting density 1. All of the others are intermediate. So we can make three labels for our graph: A (representing 1:1), B (representing all the intermediate combinations), and C (representing 3:2).
Make a data frame with the group labels
Now we need to make an additional data frame so we can add these groupwise differences to our graph.
First, summarize the original data using fertilizer type and planting density as grouping variables.
Next, add the group labels as a new variable in the data frame.
Your data frame should look like this:
Now we are ready to start making the plot for our report.
Plot the raw data
The output looks like this:
Add the means and standard errors to the graph
The output looks like this:
This is very hard to read, since all of the different groupings for fertilizer type are stacked on top of one another. We will solve this in the next step.
Split up the data
To show which groups are different from one another, use facet_wrap()
to split the data up over the three types of fertilizer. To add labels, use geom_text()
, and add the group letters from the mean.yield.data dataframe you made earlier.
The output looks like this:
Make the graph ready for publication
In this step we will remove the grey background and add axis labels.
The final version of your graph looks like this:
Step 7: Report the results
In addition to a graph, it’s important to state the results of the ANOVA test. Include:
 A brief description of the variables you tested
 The fvalue, degrees of freedom, and pvalues for each independent variable
 What the results mean.
Frequently asked questions about ANOVA
 What is the difference between a oneway and a twoway ANOVA?

The only difference between oneway and twoway ANOVA is the number of independent variables. A oneway ANOVA has one independent variable, while a twoway ANOVA has two.
 Oneway ANOVA: Testing the relationship between shoe brand (Nike, Adidas, Saucony, Hoka) and race finish times in a marathon.
 Twoway ANOVA: Testing the relationship between shoe brand (Nike, Adidas, Saucony, Hoka), runner age group (junior, senior, master’s), and race finishing times in a marathon.
All ANOVAs are designed to test for differences among three or more groups. If you are only testing for a difference between two groups, use a ttest instead.
 What is a factorial ANOVA?

A factorial ANOVA is any ANOVA that uses more than one categorical independent variable. A twoway ANOVA is a type of factorial ANOVA.
Some examples of factorial ANOVAs include:
 Testing the combined effects of vaccination (vaccinated or not vaccinated) and health status (healthy or preexisting condition) on the rate of flu infection in a population.
 Testing the effects of marital status (married, single, divorced, widowed), job status (employed, selfemployed, unemployed, retired), and family history (no family history, some family history) on the incidence of depression in a population.
 Testing the effects of feed type (type A, B, or C) and barn crowding (not crowded, somewhat crowded, very crowded) on the final weight of chickens in a commercial farming operation.
 How is statistical significance calculated in an ANOVA?

In ANOVA, the null hypothesis is that there is no difference among group means. If any group differs significantly from the overall group mean, then the ANOVA will report a statistically significant result.
Significant differences among group means are calculated using the F statistic, which is the ratio of the mean sum of squares (the variance explained by the independent variable) to the mean square error (the variance left over).
If the F statistic is higher than the critical value (the value of F that corresponds with your alpha value, usually 0.05), then the difference among groups is deemed statistically significant.
 What is the difference between quantitative and categorical variables?

Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age).
Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).
You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results.
5 comments
Matt Lavin
January 16, 2021 at 6:20 PMThank you for posting this very helpful analysis. The critique by Green should have been put more gently. Many thanks, Matt
Tuomas Plkkö
January 10, 2021 at 11:35 PMAnother error is also, that the section on posthoc analysis notes that there are "significant differences" with the different fertilizers, but then says that there is "no difference between fertilizer groups 2 and 1" because the pvalue is high.
This statement is false as it suggests that a large pvalue proves that there is no difference, which it does not. Actually, even the output of the test (shown lower) has a difference column displaying the size of the difference claimed to not exist.
It's important to realize that while a low pvalue allows you to "reject the null hypothesis", a high value does not allow you to accept or reject anything.
Shona McCombes (Scribbr Team)
January 19, 2021 at 7:04 PMHi Tuomas,
Thanks for pointing this out – I've adjusted the phrasing to make it clearer that we're saying the difference between groups 2 and 1 was not statistically significant. You're certainly right to emphasize that a high pvalue doesn't prove no difference – we also cover this topic in detail in a more recent article about statistical significance.
Peter Green
December 12, 2020 at 1:11 PMYour article is useful generally, but there's a serious error as well. Your suggestion (is Step 2) to histogram the response variable to check normality is a classic howler. The assumption of normality applies to the errors (the differences between the responses and the true linear predictors) not the responses themselves. So you can check the assumption only after fitting the model  by e.g. histogramming the residuals or using the diagnostic plots provided  as you suggest late in the piece. Please correct this mistake to avoid misleading people.
Shona McCombes (Scribbr Team)
December 21, 2020 at 7:58 PMHi Peter,
Thanks for pointing out this error. The article has now been updated to reflect this.