Published on
August 28, 2020
by
Rebecca Bevans.
Revised on
January 7, 2021.

The t-distribution, also known as Student’s t-distribution, is a way of describing data that follow a bell curve when plotted on a graph, with the greatest number of observations close to the mean and fewer observations in the tails.

It is a type of normal distribution used for smaller sample sizes, where the variance in the data is unknown.

In statistics, the t-distribution is most often used to:

Find the critical values for a confidence interval when the data is approximately normally distributed.

Published on
August 7, 2020
by
Rebecca Bevans.
Revised on
February 11, 2021.

When you make an estimate in statistics, whether it is a summary statistic or a test statistic, there is always uncertainty around that estimate because the number is based on a sample of the population you are studying.

The confidence interval is the range of values that you expect your estimate to fall between a certain percentage of the time if you run your experiment again or re-sample the population in the same way.

The confidence level is the percentage of times you expect to reproduce an estimate between the upper and lower bounds of the confidence interval, and is set by the alpha value.

Published on
July 17, 2020
by
Rebecca Bevans.
Revised on
January 7, 2021.

The test statistic is a number calculated from a statistical test of a hypothesis. It shows how closely your observed data match the distribution expected under the null hypothesis of that statistical test.

The test statistic is used to calculate the p-value of your results, helping to decide whether to reject your null hypothesis.

Published on
July 16, 2020
by
Rebecca Bevans.
Revised on
January 7, 2021.

The p-value is a number, calculated from a statistical test, that describes how likely you are to have found a particular set of observations if the null hypothesis were true.

P-values are used in hypothesis testing to help decide whether to reject the null hypothesis. The smaller the p-value, the more likely you are to reject the null hypothesis.

The Akaike information criterion (AIC) is a mathematical method for evaluating how well a model fits the data it was generated from. In statistics, AIC is used to compare different possible models and determine which one is the best fit for the data. AIC is calculated from:

the number of independent variables used to build the model.

the maximum likelihood estimate of the model (how well the model reproduces the data).

The best-fit model according to AIC is the one that explains the greatest amount of variation using the fewest possible independent variables.

Published on
March 20, 2020
by
Rebecca Bevans.
Revised on
January 7, 2021.

ANOVA (Analysis of Variance) is a statistical test used to analyze the difference between the means of more than two groups.

A two-way ANOVA is used to estimate how the mean of a quantitative variable changes according to the levels of two categorical variables. Use a two-way ANOVA when you want to know how two independent variables, in combination, affect a dependent variable.

Published on
March 6, 2020
by
Rebecca Bevans.
Revised on
January 19, 2021.

ANOVA is a statistical test for estimating how a quantitative dependent variable changes according to the levels of one or more categorical independent variables. ANOVA tests whether there is a difference in means of the groups at each level of the independent variable.

The null hypothesis (H_{0}) of the ANOVA is no difference in means, and the alternate hypothesis (H_{a}) is that the means are different from one another.

In this guide, we will walk you through the process of a one-way ANOVA (one independent variable) and a two-way ANOVA (two independent variables).

Our sample dataset contains observations from an imaginary study of the effects of fertilizer type and planting density on crop yield.

We will also include examples of how to perform and interpret a two-way ANOVA with an interaction term, and an ANOVA with a blocking variable.

Published on
February 25, 2020
by
Rebecca Bevans.
Revised on
December 14, 2020.

Linear regression is a regression model that uses a straight line to describe the relationship between variables. It finds the line of best fit through your data by searching for the value of the regression coefficient(s) that minimizes the total error of the model.

Published on
February 20, 2020
by
Rebecca Bevans.
Revised on
October 26, 2020.

Regression models are used to describe relationships between variables by fitting a line to the observed data. Regression allows you to estimate how a dependent variable changes as the independent variable(s) change.

Multiple linear regression is used to estimate the relationship between two or more independent variables and one dependent variable. You can use multiple linear regression when you want to know:

How strong the relationship is between two or more independent variables and one dependent variable (e.g. how rainfall, temperature, and amount of fertilizer added affect crop growth).

The value of the dependent variable at a certain value of the independent variables (e.g. the expected yield of a crop at certain levels of rainfall, temperature, and fertilizer addition).