An introduction to multistage sampling
In multistage sampling, or multistage cluster sampling, you draw a sample from a population using smaller and smaller groups (units) at each stage. It’s often used to collect data from a large, geographically spread group of people in national surveys.
Single-stage vs multistage sampling
In single-stage sampling, you divide a population into units (e.g., households or individuals) and select a sample directly by collecting data from everyone in the selected units.
In multistage sampling, you divide the population into smaller and smaller groupings to create a sample using several steps. You can take advantage of hierarchical groupings (e.g., from state to city to neighborhood) to create a sample that’s less expensive and time-consuming to collect data from.
You can use either probability or non-probability sampling methods in single-stage and multi-stage sampling. But for external validity, or generalizability, it’s best to use probability sampling methods, which allow for stronger statistical inferences.
In single-stage probability sampling, you start with a sampling frame, which is a list of every member in the entire population. It should be as complete as possible, so that your sample accurately reflects your population.
Cluster vs stratified sampling
In cluster sampling and stratified sampling, you divide up your population into groups that are mutually exclusive and exhaustive.
In cluster sampling, the population is divided into clusters, which are usually based on geography (e.g., cities or states) or organization (e.g., schools or universities). In single-stage cluster sampling, you randomly select some of the clusters for your sample and collect data from everyone within those clusters in one stage.
In stratified sampling, the population is divided into strata, which are often based on demographic characteristics such as race, gender or socioeconomic status. Every unit or member of the population is placed in one stratum. You select some members from each stratum so that all groups are represented in your sample.
Multistage sampling often involves a combination of cluster and stratified sampling.
Multistage sampling is often considered an extended version of cluster sampling.
In multistage sampling, you divide the population into clusters and select some clusters at the first stage. At each subsequent stage, you further divide up those selected clusters into smaller clusters, and repeat the process until you get to the last step. At the last step, you only select some members of each cluster for your sample.
Like in single-stage sampling, you start by defining your target population. But in multistage sampling, you don’t need a sampling frame that lists every member of the population. That’s why this method is useful for collecting data from large, dispersed populations.
In multistage sampling, you always go from higher-level to lower-level clusters at each stage. The clusters are often referred to as sampling units.
At the first stage, you divide up the population into clusters and select some of them: these are your primary sampling units (PSUs).
At the second stage, you divide up your PSUs into further clusters, and select some of them as your secondary sampling units (SSUs).
You can end at the second stage, or continue this process with as many stages as you need. In the last stage, you’ll get to your final sample of ultimate sampling units (USUs).
For a probability sample, you must use a probability sampling technique to select clusters at every stage. But you can mix it up by using simple random, systemic, or stratified methods to select units at each stage based on what’s relevant and applicable to your study.
First stage: Primary sampling units
At the first stage, like in cluster sampling, you’ll divide your population into clusters that are mutually exclusive and exhaustive.
Then, you’ll choose some of your clusters to be your primary sampling units, ideally using a probability sampling method. You can use any of the single-stage sampling methods to select your PSUs.
Large-scale surveys often use a combination of cluster and stratified sampling at the first stage to help ensure that the units are representative of the larger population. This is called a stratified multistage sample.
You begin by stratifying your clusters at the first stage. After stratification, you select clusters using a probability sampling method.
Single-stage cluster sampling ends at this point because you would collect data from everyone within your selected clusters (the PSUs). This is often unfeasible in real life, so multistage sampling goes further by sampling from within each cluster or unit to create new units.
Second stage: Secondary sampling units
At the second stage, you divide up your PSUs to get to smaller sampling units. You’ll select only some of these smaller units from within each selected PSU: these are your secondary sampling units (SSUs).
If you end your sampling at this point, it’s called two-stage or double-stage sampling. This would mean collecting data from everyone in your secondary sampling units: all students in the selected schools.
It’s optional to continue the process further by adding more stages, but it can often make the research process simpler.
Final stage: Ultimate sampling units
You can keep repeating the process of dividing up each sampling unit further and selecting a few of them for the next stage. At the final stage, you end with your ultimate sampling units.
Advantages and disadvantages
Multistage sampling is effective and flexible with large samples, but it may be difficult to ensure your sample is representative of the population.
- You don’t need to start with a sampling frame of your target population.
- Compared to a simple random sample, it’s relatively inexpensive and effective when you have a large or geographically dispersed population.
- It’s flexible—you can vary sampling methods between stages based on what’s appropriate or feasible.
- Compared to simple random samples, you’ll need a larger sample size for a multistage sample to achieve the same statistical inference properties.
- The best choice of sampling method at each stage is very subjective, so you’ll need clear reasoning for your decision.
- It can lead to unrepresentative samples because large sections of populations may not be selected for sampling.
Frequently asked questions about multistage sampling
- What is probability sampling?
- What is multistage sampling?
This method is often used to collect data from a large, geographically spread group of people in national surveys, for example. You take advantage of hierarchical groupings (e.g., from state to city to neighborhood) to create a sample that’s less expensive and time-consuming to collect data from.
- Is multistage sampling a probability sampling method?
For a probability sample, you have to probability sampling at every stage. You can mix it up by using simple random sampling, systematic sampling, or stratified sampling to select units at different stages, depending on what is applicable and relevant to your study.
- What are the pros and cons of multistage sampling?
But multistage sampling may not lead to a representative sample, and larger samples are needed for multistage samples to achieve the statistical properties of simple random samples.