Correlation vs causation

Correlation means there is a statistical association between variables. Causation means that a change in one variable causes a change in another variable.

In research, you might have come across the phrase “correlation doesn’t imply causation.” Correlation and causation are two related ideas, but understanding their differences will help you critically evaluate and interpret scientific research.

What’s the difference?

Correlation describes an association between variables: when one variable changes, so does the other. A correlation is a statistical indicator of the relationship between variables. These variables change together: they covary. But this covariation isn’t necessarily due to a direct or indirect causal link.

Causation means that changes in one variable brings about changes in the other; there is a cause-and-effect relationship between variables. The two variables are correlated with each other and there is also a causal link between them.

A correlation doesn’t imply causation, but causation always implies correlation.

Why doesn’t correlation mean causation?

There are two main reasons why correlation isn’t causation. These problems are important to identify for drawing sound scientific conclusions from research.

The third variable problem means that a confounding variable affects both variables to make them seem causally related when they are not. For example, ice cream sales and violent crime rates are closely correlated, but they are not causally linked with each other. Instead, hot temperatures, a third variable, affects both variables separately.

The directionality problem is when two variables correlate and might actually have a causal relationship, but it’s impossible to conclude which variable causes changes in the other. For example, vitamin D levels are correlated with depression, but it’s not clear whether low vitamin D causes depression, or whether depression causes reduced vitamin D intake.

You’ll need to use an appropriate research design to distinguish between correlational and causal relationships.

Correlational research designs can only demonstrate correlational links between variables, while experimental designs can test causation.

What can proofreading do for your paper?

Scribbr editors not only correct grammar and spelling mistakes, but also strengthen your writing by making sure your paper is free of vague language, redundant words and awkward phrasing.

See editing example

Correlational research

In a correlational research design, you collect data on your variables without manipulating them.

Example: Correlational research
You collect survey data to investigate whether there is a relationship between physical activity levels and self esteem. You ask participants about their current levels of exercise and measure their self-esteem using an inventory.

You find that physical activity level is positively correlated with self esteem: lower levels of physical activity are associated with lower self esteem, while higher levels of physical activity are associated with higher self esteem.

Correlational research is usually high in external validity, so you can generalize your findings to real life settings. But these studies are low in internal validity, which makes it difficult to causally connect changes in one variable to changes in the other.

These research designs are commonly used when it’s unethical, too costly, or too difficult to perform controlled experiments. They are also used to study relationships that aren’t expected to be causal.

Example: Correlational research
To study whether consuming violent media is related to aggression, you collect data on children’s video game use and their behavioral tendencies. You ask parents to report the number of weekly hours their child spent playing violent video games, and you survey parents and teachers on the children’s behaviors.

You find a positive correlation between the variables: children who spend more time playing violent video games have higher rates of aggressive behavior.

Third variable problem

Without controlled experiments, it’s hard to say whether it was the variable you’re interested in that caused changes in another variable. Extraneous variables are any third variable other than your variables of interest that could affect your results.

Limited control in correlational research means that extraneous or confounding variables serve as alternative explanations for the results. Confounding variables can make it seem as though a correlational relationship is causal when it isn’t.

Example: Extraneous and confounding variables
In your study on violent video games and aggression, parental attention is a confounding variable that could influence how much children use violent video games and their behavioral tendencies. Low quality parental attention can increase both violent video game use and aggressive behaviors in children.

But it’s not something you control for, so you can only draw a conclusion of correlation between your main variables.

When two variables are correlated, all you can say is that changes in one variable occur alongside changes in the other.

Spurious correlations

A spurious correlation is when two variables appear to be related through hidden third variables or simply by coincidence.

Example: Spurious correlation
In Germany and Denmark, statistical evidence shows a clear positive correlation between the population of storks and the birth rate spanning decades. As the stork population fluctuates, so does the number of newborns. How do you account for this pattern?

The Theory of the Stork draws a simple causal link between the variables to argue that storks physically deliver babies. This satirical study shows why you can’t conclude causation from correlational research alone.

In reality, the correlation may be explained by third variables (such as weather patterns, environmental developments, etc.) that caused an increase in both the stork and human populations, or the link may be purely coincidental.

When you analyze correlations in a large dataset with many variables, the chances of finding at least one statistically significant result are high. In this case, you’re more likely to make a type I error. This means erroneously concluding there is a true correlation between variables in the population based on skewed sample data.

Directionality problem

To demonstrate causation, you need to show a directional relationship with no alternative explanations. This relationship can be unidirectional, with one variable impacting the other, or bidirectional, where both variables impact each other.

A correlational design won’t be able to distinguish between any of these possibilities, but an experimental design can test each possible direction, one at a time.

Example: Directionality problem
The variables of physical activity and self esteem can be causally related in three ways:

  • Physical activity may affect self esteem
  • Self esteem may affect physical activity
  • Physical activity and self esteem may both affect each other

In correlational research, the directionality of a relationship is unclear because there is limited researcher control. You might risk concluding reverse causality, the wrong direction of the relationship.

Causal research

Causal links between variables can only be truly demonstrated with controlled experiments. Experiments test formal predictions, called hypotheses, to establish causality in one direction at a time.

Experiments are high in internal validity, so cause-and-effect relationships can be demonstrated with reasonable confidence.

You can establish directionality in one direction because you manipulate an independent variable before measuring the change in a dependent variable.

Example: Testing directionality in an experimental design
You believe that physical activity level affects self esteem, so you test this hypothesis in an experiment. You apply a physical activity intervention and measure changes in self esteem. To establish directionality, your physical activity intervention has to come before any observed change in self esteem.

To test whether this relationship is bidirectional, you’ll need to design a new experiment assessing whether self esteem can impact physical activity level.

In a controlled experiment, you can also eliminate the influence of third variables by using random assignment and control groups.

Random assignment helps distribute participant characteristics evenly between groups so that they’re similar and comparable. A control group lets you compare the experimental manipulation to a similar treatment or no treatment.

Example: Controlling third variables in an experimental design
You randomly place each participant into a control group or an experimental group. Random assignment removes the effects of third variable participant characteristics such as age or mental health status that might influence your results.

The control group receives an unrelated, comparable intervention, while the experimental group receives the physical activity intervention. By keeping all variables constant between groups, except for your independent variable treatment, any differences between groups can be attributed to your intervention.

Frequently asked questions about correlation and causation

What is a correlation?

A correlation reflects the strength and/or direction of the association between two or more variables.

  • A positive correlation means that both variables change in the same direction.
  • A negative correlation means that the variables change in opposite directions.
  • A zero correlation means there’s no relationship between the variables.
What’s the difference between correlation and causation?

Correlation describes an association between variables: when one variable changes, so does the other. A correlation is a statistical indicator of the relationship between variables.

Causation means that changes in one variable brings about changes in the other; there is a cause-and-effect relationship between variables. The two variables are correlated with each other, and there’s also a causal link between them.

Why doesn’t correlation imply causation?

The third variable and directionality problems are two main reasons why correlation isn’t causation.

The third variable problem means that a confounding variable affects both variables to make them seem causally related when they are not.

The directionality problem is when two variables correlate and might actually have a causal relationship, but it’s impossible to conclude which variable causes changes in the other.

What’s the difference between correlational and experimental research?

Controlled experiments establish causality, whereas correlational studies only show associations between variables.

  • In an experimental design, you manipulate an independent variable and measure its effect on a dependent variable. Other variables are controlled so they can’t impact the results.
  • In a correlational design, you measure variables without manipulating any of them. You can test whether your variables change together, but you can’t be sure that one variable caused a change in another.

In general, correlational research is high in external validity while experimental research is high in internal validity.

Is this article helpful?
Pritha Bhandari

Pritha has an academic background in English, psychology and cognitive neuroscience. As an interdisciplinary researcher, she enjoys writing articles explaining tricky research concepts for students and academics.