The p-value explained

The p-value is a number, calculated from a statistical test, that describes how likely you are to have found a particular set of observations if the null hypothesis were true.

P-values are used in hypothesis testing to help decide whether to reject the null hypothesis. The smaller the p-value, the more likely you are to reject the null hypothesis.

What is a null hypothesis?

All statistical tests have a null hypothesis. For most tests, the null hypothesis is that there is no relationship between your variables of interest or that there is no difference among groups.

For example, in a two-tailed t-test, the null hypothesis is that the difference between two groups is zero.

Example: Null and alternative hypothesis
You want to know whether there is a difference in longevity between two groups of mice fed on different diets, diet A and diet B. You can statistically test the difference between these two diets using a two-tailed t test.

  • Null hypothesis: there is no difference in longevity between the two groups.
  • Alternative hypothesis: there is a difference in longevity between the two groups.

What exactly is a p-value?

The p-value, or probability value, tells you how likely it is that your data could have occurred under the null hypothesis. It does this by calculating the likelihood of your test statistic, which is the number calculated by a statistical test using your data.

The p-value tells you how often you would expect to see a test statistic as extreme or more extreme than the one calculated by your statistical test if the null hypothesis of that test was true. The p-value gets smaller as the test statistic calculated from your data gets further away from the range of test statistics predicted by the null hypothesis.

The p-value is a proportion: if your p-value is 0.05, that means that 5% of the time you would see a test statistic at least as extreme as the one you found if the null hypothesis was true.

Example: Test statistic and p-value
If the mice live equally long on either diet, then the test statistic from your t-test will closely match the test statistic from the null hypothesis (that there is no difference between groups), and the resulting p-value will be close to 1. It likely won’t reach exactly 1, because in real life the groups will probably not be perfectly equal.

If, however, there is an average difference in longevity between the two groups, then your test statistic will move further away from the values predicted by the null hypothesis, and the p-value will get smaller. The p-value will never reach zero, because there’s always a possibility, even if extremely unlikely, that the patterns in your data occurred by chance.

How do you calculate the p-value?

P-values are usually automatically calculated by your statistical program (R, SPSS, etc.).

You can also find tables for estimating the p-value of your test statistic online. These tables show, based on the test statistic and degrees of freedom (number of observations minus number of independent variables) of your test, how frequently you would expect to see that test statistic under the null hypothesis.

The calculation of the p-value depends on the statistical test you are using to test your hypothesis:

  • Different statistical tests have different assumptions and generate different test statistics. You should choose the statistical test that best fits your data and matches the effect or relationship you want to test.
  • The number of independent variables you include in your test changes how large or small the test statistic needs to be to generate the same p-value.
Example: Choosing a statistical test
If you are comparing only two different diets, then a two-sample t-test is a good way to compare the groups. To compare three different diets, use an ANOVA instead – doing multiple pairwise comparisons will result in artificially low p-values and lets you overestimate the significance of the difference between groups.

No matter what test you use, the p-value always describes the same thing: how often you can expect to see a test statistic as extreme or more extreme than the one calculated from your test.

P-values and statistical significance

P-values are most often used by researchers to say whether a certain pattern they have measured is statistically significant.

Statistical significance is another way of saying that the p-value of a statistical test is small enough to reject the null hypothesis of the test.

How small is small enough? The most common threshold is p < 0.05; that is, when you would expect to find a test statistic as extreme as the one calculated by your test only 5% of the time. But the threshold depends on your field of study – some fields prefer thresholds of 0.01, or even 0.001.

The threshold value for determining statistical significance is also known as the alpha value.

Example: Statistical significance
Your comparison of the two mouse diets results in a p-value of less than 0.01, below your alpha value of 0.05; therefore you determine that there is a statistically significant difference between the two diets.

Reporting p-values

P-values of statistical tests are usually reported in the results section of a research paper, along with the key information needed for readers to put the p-values in context – for example, correlation coefficient in a linear regression, or the average difference between treatment groups in a t-test.

Example: Reporting the results
In our comparison of mouse diet A and mouse diet B, we found that the lifespan on diet A  (mean = 2.1 years; sd = 0.12) was significantly shorter than the lifespan on diet B (mean = 2.6 years; sd = 0.1), with an average difference of 6 months (t(80) = -12.75; p < 0.01).

Caution when using p-values

P-values are often interpreted as your risk of rejecting the null hypothesis of your test when the null hypothesis is actually true.

In reality, the risk of rejecting the null hypothesis is often higher than the p-value, especially when looking at a single study or when using small sample sizes. This is because the smaller your frame of reference, the greater the chance that you stumble across a statistically significant pattern completely by accident.

P-values are also often interpreted as supporting or refuting the alternative hypothesis. This is not the case. The p-value can only tell you whether or not the null hypothesis is supported. It cannot tell you whether your alternative hypothesis is true, or why.

Frequently asked questions about p-values

What is a p-value?

A p-value, or probability value, is a number describing how likely it is that your data would have occurred under the null hypothesis of your statistical test.

How do you calculate a p-value?

P-values are usually automatically calculated by the program you use to perform your statistical test. They can also be estimated using p-value tables for the relevant test statistic.

P-values are calculated from the null distribution of the test statistic. They tell you how often a test statistic is expected to occur under the null hypothesis of the statistical test, based on where it falls in the null distribution.

If the test statistic is far from the mean of the null distribution, then the p-value will be small, showing that the test statistic is not likely to have occurred under the null hypothesis.

What is statistical significance?

Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test. Significance is usually denoted by a p-value, or probability value.

Statistical significance is arbitrary – it depends on the threshold, or alpha value, chosen by the researcher. The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of the time under the null hypothesis.

When the p-value falls below the chosen alpha value, then we say the result of the test is statistically significant.

Does a p-value tell you whether your alternative hypothesis is true?

No. The p-value only tells you how likely the data you have observed is to have occurred under the null hypothesis.

If the p-value is below your threshold of significance (typically p < 0.05), then you can reject the null hypothesis, but this does not necessarily mean that your alternative hypothesis is true.

Is this article helpful?
Rebecca Bevans

Rebecca is working on her PhD in soil ecology and spends her free time writing. She's very happy to be able to nerd out about statistics with all of you.

Comment or ask a question.

Please click the checkbox on the left to verify that you are a not a bot.