An introduction to statistical significance

If a result is statistically significant, that means it’s unlikely to be explained solely by chance or random factors. In other words, a statistically significant result has a very low chance of occurring if there were no true effect in a research study.

The p value, or probability value, tells you the statistical significance of a finding. In most studies, a p value of 0.05 or less is considered statistically significant, but this threshold can also be set higher or lower.

How do you test for statistical significance?

In quantitative research, data are analyzed through null hypothesis significance testing, or hypothesis testing. This is a formal procedure for assessing whether a relationship between variables or a difference between groups is statistically significant.

Null and alternative hypotheses

To begin, research predictions are rephrased into two main hypotheses:

  • A null hypothesis (H0) always predicts no true effect, no relationship between variables, or no difference between groups.
  • An alternative hypothesis (Ha or H1) states your main prediction of a true effect, a relationship between variables, or a difference between groups.

Hypothesis testing always starts with the assumption that the null hypothesis is true. Using this procedure, you can assess the likelihood (probability) of obtaining your results under this assumption. Based on the outcome of the test, you can reject or retain the null hypothesis.

Example: Formulating a null and alternative hypothesis
You design an experiment to test whether actively smiling can make people feel happier. To begin, you restate your predictions into a null and alternative hypothesis.

  • H0: There is no difference in happiness between actively smiling and not smiling.
  • Ha: Actively smiling leads to more happiness than not smiling.

Test statistics and p values

Every statistical test produces:

  • A test statistic that indicates how closely your data match the null hypothesis.
  • A corresponding p value that tells you the probability of obtaining this result if the null hypothesis is true.

The p value determines statistical significance. An extremely low p value indicates high statistical significance, while a high p value means low or no statistical significance.

Example: Hypothesis testing
To test your hypothesis, you first collect data from two groups. The experimental group actively smiles, while the control group does not. Both groups record happiness ratings on a scale from 1–7.

Next, you perform a t test to see whether actively smiling leads to more happiness. Using the difference in average happiness between the two groups, you calculate:

  • a t value (the test statistic) that tells you how much the sample data differs from the null hypothesis,
  • a p value showing the likelihood of finding this result if the null hypothesis is true.

    To interpret your results, you will compare your p value to a predetermined significance level.

    What is a significance level?

    The significance level, or alpha (α), is a value that the researcher sets in advance as the threshold for statistical significance. It is the maximum risk of making a false positive conclusion (Type I error) that you are willing to accept.

    In a hypothesis test, the p value is compared to the significance level to decide whether to reject the null hypothesis.

    • If the p value is higher than the significance level, the null hypothesis is not refuted, and the results are not statistically significant.
    • If the p value is lower than the significance level, the results are interpreted as refuting the null hypothesis and reported as statistically significant.

    Usually, the significance level is set to 0.05 or 5%. That means your results must have a 5% or lower chance of occurring under the null hypothesis to be considered statistically significant.

    The significance level can be lowered for a more conservative test. That means an effect has to be larger to be considered statistically significant.

    The significance level may also be set higher for significance testing in non-academic marketing or business contexts. This makes the study less rigorous and increases the probability of finding a statistically significant result.

    As best practice, you should set a significance level before you begin your study. Otherwise, you can easily manipulate your results to match your research predictions.

    It’s important to note that hypothesis testing can only show you whether or not to reject the null hypothesis in favor of the alternative hypothesis. It can never “prove” the null hypothesis, because the lack of a statistically significant effect doesn’t mean that absolutely no effect exists.

    Example: Statistical decision making
    Through your hypothesis test, you obtain a p value of 0.0029. Since this p value is lower than your significance level of 0.05, you consider your results statistically significant and reject the null hypothesis.

    That means the difference in happiness levels of the different groups can be attributed to the experimental manipulation.

    When reporting statistical significance, include relevant descriptive statistics about your data (e.g. means and standard deviations) as well as the test statistic and p value.

    Reporting statistical significance
    Consistent with the alternative hypothesis, the experimental group (M = 4.67, SD = 2.14) reported significantly more happiness than the control group (M = 3.81, SD = 1.92), t(108) = 2.22, p = .0029.

    Receive feedback on language, structure and layout

    Professional editors proofread and edit your paper by focusing on:

    • Academic style
    • Vague sentences
    • Grammar
    • Style consistency

    See an example

    Problems with relying on statistical significance

    There are various critiques of the concept of statistical significance and how it is used in research.

    Researchers classify results as statistically significant or non-significant using a conventional threshold that lacks any theoretical or practical basis. This means that even a tiny 0.001 decrease in a p value can convert a research finding from statistically non-significant to significant with almost no real change in the effect.

    On its own, statistical significance may also be misleading because it’s affected by sample size. In extremely large samples, you’re more likely to obtain statistically significant results, even if the effect is actually small or negligible in the real world. This means that small effects are often exaggerated if they meet the significance threshold, while interesting results are ignored when they fall short of meeting the threshold.

    The strong emphasis on statistical significance has led to a serious publication bias and replication crisis in the social sciences and medicine over the last few decades. Results are usually only published in academic journals if they show statistically significant results—but statistically significant results often can’t be reproduced in high quality replication studies.

    As a result, many scientists call for retiring statistical significance as a decision-making tool in favor of more nuanced approaches to interpreting results.

    That’s why APA guidelines advise reporting not only p values but also effect sizes and confidence intervals wherever possible to show the real world implications of a research outcome.

    Other types of significance in research

    Aside from statistical significance, clinical significance and practical significance are also important research outcomes.

    Practical significance shows you whether the research outcome is important enough to be meaningful in the real world. It’s indicated by the effect size of the study.

    Practical significance
    To report practical significance, you calculate the effect size of your statistically significant finding of higher happiness ratings in the experimental group.

    The Cohen’s d is 0.266, indicating a small effect size.

    Clinical significance is relevant for intervention and treatment studies. A treatment is considered clinically significant when it tangibly or substantially improves the lives of patients.

    Frequently asked questions about statistical significance

    What is statistical significance?

    Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test. Significance is usually denoted by a p-value, or probability value.

    Statistical significance is arbitrary – it depends on the threshold, or alpha value, chosen by the researcher. The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of the time under the null hypothesis.

    When the p-value falls below the chosen alpha value, then we say the result of the test is statistically significant.

    What is a p-value?

    A p-value, or probability value, is a number describing how likely it is that your data would have occurred under the null hypothesis of your statistical test.

    How do you calculate a p-value?

    P-values are usually automatically calculated by the program you use to perform your statistical test. They can also be estimated using p-value tables for the relevant test statistic.

    P-values are calculated from the null distribution of the test statistic. They tell you how often a test statistic is expected to occur under the null hypothesis of the statistical test, based on where it falls in the null distribution.

    If the test statistic is far from the mean of the null distribution, then the p-value will be small, showing that the test statistic is not likely to have occurred under the null hypothesis.

    Does a p-value tell you whether your alternative hypothesis is true?

    No. The p-value only tells you how likely the data you have observed is to have occurred under the null hypothesis.

    If the p-value is below your threshold of significance (typically p < 0.05), then you can reject the null hypothesis, but this does not necessarily mean that your alternative hypothesis is true.

    Is this article helpful?
    Pritha Bhandari

    Pritha has an academic background in English, psychology and cognitive neuroscience. As an interdisciplinary researcher, she enjoys writing articles explaining tricky research concepts for students and academics.