Sampling bias: What is it and why does it matter?
Sampling bias occurs when some members of a population are systematically more likely to be selected in a sample than others. It is also called ascertainment bias in medical fields.
Sampling bias limits the generalizability of findings because it is a threat to external validity, specifically population validity. In other words, findings from biased samples can only be generalized to populations that share characteristics with the sample.
Causes of sampling bias
Your choice of research design or data collection method can lead to sampling bias. Sampling bias can occur in both probability and non-probability sampling.
Sampling bias in probability samples
In probability sampling, every member of the population has a chance of being selected. For instance, you can use a random number generator to select a simple random sample from your population.
Although this procedure reduces the risk of sampling bias, it may not eliminate it. If your sampling frame – the actual list of individuals that the sample is drawn from – does not match the population, this can result in a biased sample.
Sampling bias in non-probability samples
A non-probability sample is selected based on non-random criteria. For instance, in a convenience sample, participants are selected based on accessibility and availability.
Non-probability sampling often results in biased samples because some members of the population are more likely to be included than others.
Types of sampling bias
|Self-selection||People with specific characteristics are more likely to agree to take part in a study than others.||People who are more thrill-seeking are likely to take part in pain research studies. This may skew the data.|
|Non-response||People who refuse to participate or drop out from a study systematically differ from those who take part.||In a study on stress and workload, employees with high workloads are less likely to participate. The resulting sample may not vary greatly in terms of workload.|
|Undercoverage||Some members of a population are inadequately represented in the sample.||Administering general national surveys online may miss groups with limited internet access, such as the elderly and lower-income households.|
|Survivorship||Successful observations, people and objects are more likely to be represented in the sample than unsuccessful ones.||In scientific journals, there is strong publication bias towards positive results. Successful research outcomes are published far more often than null findings.|
|Pre-screening or advertising||The way participants are pre-screened or where a study is advertised may bias a sample.||When seeking volunteers to test a novel sleep intervention, you may end up with a sample that is more motivated to improve their sleep habits than the rest of the population. As a result, they may have been likely to improve their sleep habits regardless of the effects of your intervention.|
|Healthy user||Volunteers for preventative interventions are more likely to pursue health-boosting behaviors and activities than other members of the population.||A sample in a preventative intervention has a better diet, higher physical activity levels, abstains from alcohol, and avoids smoking more than most of the population. The experimental findings may be a result of the treatment interacting with these characteristics of the sample, rather than just the treatment itself.|
How to avoid or correct sampling bias
Using careful research design and sampling procedures can help you avoid sampling bias.
- Define a target population and a sampling frame (the list of individuals that the sample will be drawn from). Match the sampling frame to the target population as much as possible to reduce the risk of sampling bias.
- Make online surveys as short and accessible as possible.
- Follow up on non-responders.
- Avoid convenience sampling.
Oversampling to avoid bias
Oversampling can be used to avoid sampling bias in situations where members of defined groups are underrepresented (undercoverage). This is a method of selecting respondents from some groups so that they make up a larger share of a sample than they actually do the population.
After all data is collected, responses from oversampled groups are weighted to their actual share of the population to remove any sampling bias.
Frequently asked questions about sampling bias
- What is sampling?
A sample is a subset of individuals from a larger population. Sampling means selecting the group that you will actually collect data from in your research. For example, if you are researching the opinions of students in your university, you could survey a sample of 100 students.
In statistics, sampling allows you to test a hypothesis about the characteristics of a population.
- What is sampling bias?
- Why is sampling bias important?
Sampling bias is a threat to external validity – it limits the generalizability of your findings to a broader group of people.
- What are some types of sampling bias?
Some common types of sampling bias include self-selection, non-response, undercoverage, survivorship, pre-screening or advertising, and healthy user bias.
- How do you avoid sampling bias?