Chi-Square (Χ²) Tests | Types, Formula & Examples
A Pearson’s chi-square test is a statistical test for categorical data. It is used to determine whether your data are significantly different from what you expected. There are two types of Pearson’s chi-square tests:
- The chi-square goodness of fit test is used to test whether the frequency distribution of a categorical variable is different from your expectations.
- The chi-square test of independence is used to test whether two categorical variables are related to each other.
Chi-square is often written as Χ2 and is pronounced “kai-square” (rhymes with “eye-square”). It is also called chi-squared.
What is a chi-square test?
Pearson’s chi-square (Χ2) tests, often referred to simply as chi-square tests, are among the most common nonparametric tests. Nonparametric tests are used for data that don’t follow the assumptions of parametric tests, especially the assumption of a normal distribution.
If you want to test a hypothesis about the distribution of a categorical variable you’ll need to use a chi-square test or another nonparametric test. Categorical variables can be nominal or ordinal and represent groupings such as species or nationalities. Because they can only have a few specific values, they can’t have a normal distribution.
Test hypotheses about frequency distributions
There are two types of Pearson’s chi-square tests, but they both test whether the observed frequency distribution of a categorical variable is significantly different from its expected frequency distribution. A frequency distribution describes how observations are distributed between different groups.
Frequency distributions are often displayed using frequency distribution tables. A frequency distribution table shows the number of observations in each group. When there are two categorical variables, you can use a specific type of frequency distribution table called a contingency table to show the number of observations in each combination of groups.
A chi-square test (a chi-square goodness of fit test) can test whether these observed frequencies are significantly different from what was expected, such as equal frequencies.
A chi-square test (a test of independence) can test whether these observed frequencies are significantly different from the frequencies expected if handedness is unrelated to nationality.
The chi-square formula
Both of Pearson’s chi-square tests use the same formula to calculate the test statistic, chi-square (Χ2):
- Χ2 is the chi-square test statistic
- Σ is the summation operator (it means “take the sum of”)
- O is the observed frequency
- E is the expected frequency
The larger the difference between the observations and the expectations (O − E in the equation), the bigger the chi-square will be. To decide whether the difference is big enough to be statistically significant, you compare the chi-square value to a critical value.
When to use a chi-square test
A Pearson’s chi-square test may be an appropriate option for your data if all of the following are true:
- You want to test a hypothesis about one or more categorical variables. If one or more of your variables is quantitative, you should use a different statistical test. Alternatively, you could convert the quantitative variable into a categorical variable by separating the observations into intervals.
- The sample was randomly selected from the population.
- There are a minimum of five observations expected in each group or combination of groups.
Types of chi-square tests
The two types of Pearson’s chi-square tests are:
Mathematically, these are actually the same test. However, we often think of them as different tests because they’re used for different purposes.
Chi-square goodness of fit test
You can use a chi-square goodness of fit test when you have one categorical variable. It allows you to test whether the frequency distribution of the categorical variable is significantly different from your expectations. Often, but not always, the expectation is that the categories will have equal proportions.
Chi-square test of independence
You can use a chi-square test of independence when you have two categorical variables. It allows you to test whether the two variables are related to each other. If two variables are independent (unrelated), the probability of belonging to a certain group of one variable isn’t affected by the other variable.
Other types of chi-square tests
Some consider the chi-square test of homogeneity to be another variety of Pearson’s chi-square test. It tests whether two populations come from the same distribution by determining whether the two populations have the same proportions as each other. You can consider it simply a different way of thinking about the chi-square test of independence.
McNemar’s test is a test that uses the chi-square test statistic. It isn’t a variety of Pearson’s chi-square test, but it’s closely related. You can conduct this test when you have a related pair of categorical variables that each have two groups. It allows you to determine whether the proportions of the variables are equal.
|Like chocolate||Dislike chocolate|
- Null hypothesis (H0): The proportion of people who like chocolate is the same as the proportion of people who like vanilla.
- Alternative hypothesis (HA): The proportion of people who like chocolate is different from the proportion of people who like vanilla.
There are several other types of chi-square tests that are not Pearson’s chi-square tests, including the test of a single variance and the likelihood ratio chi-square test.
How to perform a chi-square test
The exact procedure for performing a Pearson’s chi-square test depends on which test you’re using, but it generally follows these steps:
- Create a table of the observed and expected frequencies. This can sometimes be the most difficult step because you will need to carefully consider which expected values are most appropriate for your null hypothesis.
- Calculate the chi-square value from your observed and expected frequencies using the chi-square formula.
- Find the critical chi-square value in a chi-square critical value table or using statistical software.
- Compare the chi-square value to the critical value to determine which is larger.
- Decide whether to reject the null hypothesis. You should reject the null hypothesis if the chi-square value is greater than the critical value. If you reject the null hypothesis, you can conclude that your data are significantly different from what you expected.
How to report a chi-square test
- You don’t need to provide a reference or formula since the chi-square test is a commonly used statistic.
- Refer to chi-square using its Greek symbol, Χ2. Although the symbol looks very similar to an “X” from the Latin alphabet, it’s actually a different symbol. Greek symbols should not be italicized.
- Include a space on either side of the equal sign.
- If your chi-square is less than zero, you should include a leading zero (a zero before the decimal point) since the chi-square can be greater than zero.
- Provide two significant digits after the decimal point.
- Report the chi-square alongside its degrees of freedom, sample size, and p value, following this format: Χ2 (degrees of freedom, N = sample size) = chi-square value, p = p value).
Frequently asked questions about chi-square tests
- What are the two main types of chi-square tests?
- What is the difference between a chi-square test and a t test?
- What is the difference between a chi-square test and a correlation?
- What is the difference between quantitative and categorical variables?
Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age).
Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).