An introduction to cluster sampling

In cluster sampling, researchers divide a population into smaller groups known as clusters.  They then randomly select among these clusters to form a sample.

Cluster sampling is a method of probability sampling that is often used to study large populations, particularly those that are widely geographically dispersed. Researchers usually use pre-existing units such as schools or cities as their clusters.

How to cluster sample

The simplest form of cluster sampling is single-stage cluster sampling. It involves 4 key steps.

Research example
You are interested in the average reading level of all the seventh-graders in your city.

It would be very difficult to obtain a list of all seventh-graders and collect data from a random sample spread across the city. However, you can easily obtain a list of all schools and collect data from a subset of these. You thus decide to use the cluster sampling method.

Step 1: Define your population

As with other forms of sampling, you must first begin by clearly defining the population you wish to study.

The first step of cluster sampling is to define the population you're interested in studying.

Population
In your reading program study, your population is all the seventh-graders in your city.

Step 2: Divide your sample into clusters

This is the most important part of the process. The quality of your clusters and how well they represent the larger population determines the validity of your results. Ideally, you would like for your clusters to meet the following criteria:

  • Each cluster’s population should be as diverse as possible. You want every potential characteristic of the entire population to be represented in each cluster.
  • Each cluster should have a similar distribution of characteristics as the distribution of the population as a whole.
  • Taken together, the clusters should cover the entire population.
  • There not be any overlap between clusters (i.e. the same people or units do not appear in more than one cluster).

Ideally, each cluster should be a mini-representation of the entire population. However, in practice, clusters often do not perfectly represent the population’s characteristics, which is why this method provides less statistical certainty than simple random sampling.

Because clusters are usually naturally occurring groups, such as schools, cities, or households, they are often more homogenous than the population as a whole. You should be aware of this when performing your study, as it might affect its validity.

The second step of cluster sampling is to group the population into clusters, ideally representative of the population.

Clusters
You cluster the seventh-graders by the school they attend. To cover the whole population, you need to include every school in the city. There is no overlap because each student attends only one school.

 

Step 3: Randomly select clusters to use as your sample

If each cluster is itself a mini-representation of the larger population, randomly selecting and sampling from the clusters allows you to imitate simple random sampling, which in turn supports the validity of your results.

Conversely, if the clusters are not representative, then random sampling will allow you to gather data on a diverse array of clusters, which should still provide you with an overview of the population as a whole.

The third step of cluster sampling is to randomly select clusters to use as your sample.

Sample
You assign a number to each school and use a random number generator to select a random sample.

You choose the number of clusters based on how large you want your sample size to be. This in turn is based on the estimated size of the entire seventh-grade population, your desired confidence interval and confidence level, and your best guess of the standard deviation (a measure of how spread apart the values in a population are) of the reading levels of the seventh-graders.

You then use a sample size calculator to estimate the required sample size.

Step 4: Collect data from the sample

You then conduct your study and collect data from every unit in the selected clusters.

In single-stage cluster sampling, the final step is to collect data from every unit in your selected clusters.

Data collection
You test the reading levels of every seventh-grader in the schools that were randomly selected for your sample.

Multi-stage cluster sampling

In multi-stage clustering, rather than collect data from every single unit in the selected clusters, you randomly select individual units from within the cluster to use as your sample.

You can then collect data from each of these individual units – this is known as double-stage sampling.

In double-stage cluster sampling, you randomly select units from within your selected clusters.

You can also continue this procedure, taking progressively smaller and smaller random samples, which is usually called multi-stage sampling.

You should use this method when it is infeasible or too expensive to test the entire cluster.

Example: Multistage sampling
Instead of collecting data from every seventh-grader in the selected schools, you narrow down your sample in two additional stages:

  1. From each school, you randomly select a sample of seventh-grade classes.
  2. From within those classes, you randomly select a sample of students.

The resulting sample is much smaller and therefore easier to collect data from.

What can proofreading do for your paper?

Scribbr editors not only correct grammar and spelling mistakes, but also strengthen your writing by making sure your paper is free of vague language, redundant words and awkward phrasing.

See editing example

Advantages and disadvantages

Cluster sampling is commonly used for its practical advantages, but it has some disadvantages in terms of statistical validity.

Advantages

  • Cluster sampling is time- and cost-efficient, especially for samples that are widely geographically spread and would be difficult to properly sample otherwise.
  • Because cluster sampling uses randomization, if the population is clustered properly, your study will have high external validity because your sample will reflect the characteristics of the larger population.

Disadvantages

  • Internal validity is less strong than with simple random sampling, particularly as you use more stages of clustering.
  • If your clusters are not a good mini-representation of the population as a whole, then it is more difficult to rely upon your sample to provide valid results.
  • Cluster sampling is much more complex to plan than other forms of sampling.

Frequently asked questions about cluster sampling

What is cluster sampling?

Cluster sampling is a probability sampling method in which you divide a population into clusters, such as districts or schools, and then randomly select some of these clusters as your sample.

The clusters should ideally each be mini-representations of the population as a whole.

What are the types of cluster sampling?

There are three types of cluster sampling: single-stage, double-stage and multi-stage clustering. In all three types, you first divide the population into clusters, then randomly select clusters for use in your sample.

  • In single-stage sampling, you collect data from every unit within the selected clusters.
  • In double-stage sampling, you select a random sample of units from within the clusters.
  • In multi-stage sampling, you repeat the procedure of randomly sampling elements from within the clusters until you have reached a manageable sample size.
What are some advantages and disadvantages of cluster sampling?

Cluster sampling is more time- and cost-efficient than other probability sampling methods, particularly when it comes to large samples spread across a wide geographical area.

However, it provides less statistical certainty than other methods, such as simple random sampling, because it is difficult to ensure that your clusters properly represent the population as a whole.

Is this article helpful?
Lauren Thomas

Lauren has a bachelor's degree in Economics and Political Science and is currently finishing up a master's in Economics. She is always on the move, having lived in five cities in both the US and France, and is happy to have a job that will follow her wherever she goes.

Comment or ask a question.

Please click the checkbox on the left to verify that you are a not a bot.