The p-value explained
The p-value is a number, calculated from a statistical test, that describes how likely you are to have found a particular set of observations if the null hypothesis were true.
P-values are used in hypothesis testing to help decide whether to reject the null hypothesis. The smaller the p-value, the more likely you are to reject the null hypothesis.
What is a null hypothesis?
All statistical tests have a null hypothesis. For most tests, the null hypothesis is that there is no relationship between your variables of interest or that there is no difference among groups.
For example, in a two-tailed t-test, the null hypothesis is that the difference between two groups is zero.
What exactly is a p-value?
The p-value, or probability value, tells you how likely it is that your data could have occurred under the null hypothesis. It does this by calculating the likelihood of your test statistic, which is the number calculated by a statistical test using your data.
The p-value tells you how often you would expect to see a test statistic as extreme or more extreme than the one calculated by your statistical test if the null hypothesis of that test was true. The p-value gets smaller as the test statistic calculated from your data gets further away from the range of test statistics predicted by the null hypothesis.
The p-value is a proportion: if your p-value is 0.05, that means that 5% of the time you would see a test statistic at least as extreme as the one you found if the null hypothesis was true.
How do you calculate the p-value?
P-values are usually automatically calculated by your statistical program (R, SPSS, etc.).
You can also find tables for estimating the p-value of your test statistic online. These tables show, based on the test statistic and degrees of freedom (number of observations minus number of independent variables) of your test, how frequently you would expect to see that test statistic under the null hypothesis.
The calculation of the p-value depends on the statistical test you are using to test your hypothesis:
- Different statistical tests have different assumptions and generate different test statistics. You should choose the statistical test that best fits your data and matches the effect or relationship you want to test.
- The number of independent variables you include in your test changes how large or small the test statistic needs to be to generate the same p-value.
No matter what test you use, the p-value always describes the same thing: how often you can expect to see a test statistic as extreme or more extreme than the one calculated from your test.
P-values and statistical significance
P-values are most often used by researchers to say whether a certain pattern they have measured is statistically significant.
Statistical significance is another way of saying that the p-value of a statistical test is small enough to reject the null hypothesis of the test.
How small is small enough? The most common threshold is p < 0.05; that is, when you would expect to find a test statistic as extreme as the one calculated by your test only 5% of the time. But the threshold depends on your field of study – some fields prefer thresholds of 0.01, or even 0.001.
The threshold value for determining statistical significance is also known as the alpha value.
P-values of statistical tests are usually reported in the results section of a research paper, along with the key information needed for readers to put the p-values in context – for example, correlation coefficient in a linear regression, or the average difference between treatment groups in a t-test.
Caution when using p-values
P-values are often interpreted as your risk of rejecting the null hypothesis of your test when the null hypothesis is actually true.
In reality, the risk of rejecting the null hypothesis is often higher than the p-value, especially when looking at a single study or when using small sample sizes. This is because the smaller your frame of reference, the greater the chance that you stumble across a statistically significant pattern completely by accident.
P-values are also often interpreted as supporting or refuting the alternative hypothesis. This is not the case. The p-value can only tell you whether or not the null hypothesis is supported. It cannot tell you whether your alternative hypothesis is true, or why.
Frequently asked questions about p-values
- What is a p-value?
- How do you calculate a p-value?
P-values are calculated from the null distribution of the test statistic. They tell you how often a test statistic is expected to occur under the null hypothesis of the statistical test, based on where it falls in the null distribution.
If the test statistic is far from the mean of the null distribution, then the p-value will be small, showing that the test statistic is not likely to have occurred under the null hypothesis.
- What is statistical significance?
Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test. Significance is usually denoted by a p-value, or probability value.
Statistical significance is arbitrary – it depends on the threshold, or alpha value, chosen by the researcher. The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of the time under the null hypothesis.
When the p-value falls below the chosen alpha value, then we say the result of the test is statistically significant.
- Does a p-value tell you whether your alternative hypothesis is true?
No. The p-value only tells you how likely the data you have observed is to have occurred under the null hypothesis.
If the p-value is below your threshold of significance (typically p < 0.05), then you can reject the null hypothesis, but this does not necessarily mean that your alternative hypothesis is true.