An introduction to statistical significance
If a result is statistically significant, that means it’s unlikely to be explained solely by chance or random factors. In other words, a statistically significant result has a very low chance of occurring if there were no true effect in a research study.
The p value, or probability value, tells you the statistical significance of a finding. In most studies, a p value of 0.05 or less is considered statistically significant, but this threshold can also be set higher or lower.
How do you test for statistical significance?
In quantitative research, data are analyzed through null hypothesis significance testing, or hypothesis testing. This is a formal procedure for assessing whether a relationship between variables or a difference between groups is statistically significant.
Null and alternative hypotheses
To begin, research predictions are rephrased into two main hypotheses:
- A null hypothesis (H0) always predicts no true effect, no relationship between variables, or no difference between groups.
- An alternative hypothesis (Ha or H1) states your main prediction of a true effect, a relationship between variables, or a difference between groups.
Hypothesis testing always starts with the assumption that the null hypothesis is true. Using this procedure, you can assess the likelihood (probability) of obtaining your results under this assumption. Based on the outcome of the test, you can reject or retain the null hypothesis.
Test statistics and p values
Every statistical test produces:
- A test statistic that indicates how closely your data match the null hypothesis.
- A corresponding p value that tells you the probability of obtaining this result if the null hypothesis is true.
The p value determines statistical significance. An extremely low p value indicates high statistical significance, while a high p value means low or no statistical significance.
What is a significance level?
The significance level, or alpha (α), is a value that the researcher sets in advance as the threshold for statistical significance. It is the maximum risk of making a false positive conclusion (Type I error) that you are willing to accept.
In a hypothesis test, the p value is compared to the significance level to decide whether to reject the null hypothesis.
- If the p value is higher than the significance level, the null hypothesis is not refuted, and the results are not statistically significant.
- If the p value is lower than the significance level, the results are interpreted as refuting the null hypothesis and reported as statistically significant.
Usually, the significance level is set to 0.05 or 5%. That means your results must have a 5% or lower chance of occurring under the null hypothesis to be considered statistically significant.
The significance level can be lowered for a more conservative test. That means an effect has to be larger to be considered statistically significant.
The significance level may also be set higher for significance testing in non-academic marketing or business contexts. This makes the study less rigorous and increases the probability of finding a statistically significant result.
As best practice, you should set a significance level before you begin your study. Otherwise, you can easily manipulate your results to match your research predictions.
It’s important to note that hypothesis testing can only show you whether or not to reject the null hypothesis in favor of the alternative hypothesis. It can never “prove” the null hypothesis, because the lack of a statistically significant effect doesn’t mean that absolutely no effect exists.
Problems with relying on statistical significance
There are various critiques of the concept of statistical significance and how it is used in research.
Researchers classify results as statistically significant or non-significant using a conventional threshold that lacks any theoretical or practical basis. This means that even a tiny 0.001 decrease in a p value can convert a research finding from statistically non-significant to significant with almost no real change in the effect.
On its own, statistical significance may also be misleading because it’s affected by sample size. In extremely large samples, you’re more likely to obtain statistically significant results, even if the effect is actually small or negligible in the real world. This means that small effects are often exaggerated if they meet the significance threshold, while interesting results are ignored when they fall short of meeting the threshold.
The strong emphasis on statistical significance has led to a serious publication bias and replication crisis in the social sciences and medicine over the last few decades. Results are usually only published in academic journals if they show statistically significant results—but statistically significant results often can’t be reproduced in high quality replication studies.
As a result, many scientists call for retiring statistical significance as a decision-making tool in favor of more nuanced approaches to interpreting results.
Other types of significance in research
Aside from statistical significance, clinical significance and practical significance are also important research outcomes.
Practical significance shows you whether the research outcome is important enough to be meaningful in the real world. It’s indicated by the effect size of the study.
Clinical significance is relevant for intervention and treatment studies. A treatment is considered clinically significant when it tangibly or substantially improves the lives of patients.
Frequently asked questions about statistical significance
- What is statistical significance?
Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test. Significance is usually denoted by a p-value, or probability value.
Statistical significance is arbitrary – it depends on the threshold, or alpha value, chosen by the researcher. The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of the time under the null hypothesis.
When the p-value falls below the chosen alpha value, then we say the result of the test is statistically significant.
- What is a p-value?
- How do you calculate a p-value?
P-values are calculated from the null distribution of the test statistic. They tell you how often a test statistic is expected to occur under the null hypothesis of the statistical test, based on where it falls in the null distribution.
If the test statistic is far from the mean of the null distribution, then the p-value will be small, showing that the test statistic is not likely to have occurred under the null hypothesis.
- Does a p-value tell you whether your alternative hypothesis is true?
No. The p-value only tells you how likely the data you have observed is to have occurred under the null hypothesis.
If the p-value is below your threshold of significance (typically p < 0.05), then you can reject the null hypothesis, but this does not necessarily mean that your alternative hypothesis is true.