# Chi-Square Test of Independence | Formula, Guide & Examples

A chi-square (Χ2) test of independence is a nonparametric hypothesis test. You can use it to test whether two categorical variables are related to each other.

## What is the chi-square test of independence?

A chi-square (Χ2) test of independence is a type of Pearson’s chi-square test. Pearson’s chi-square tests are nonparametric tests for categorical variables. They’re used to determine whether your data are significantly different from what you expected.

You can use a chi-square test of independence, also known as a chi-square test of association, to determine whether two categorical variables are related. If two variables are related, the probability of one variable having a certain value is dependent on the value of the other variable.

The chi-square test of independence calculations are based on the observed frequencies, which are the numbers of observations in each combined group.

The test compares the observed frequencies to the frequencies you would expect if the two variables are unrelated. When the variables are unrelated, the observed and expected frequencies will be similar.

### Contingency tables

When you want to perform a chi-square test of independence, the best way to organize your data is a type of frequency distribution table called a contingency table.

A contingency table, also known as a cross tabulation or crosstab, shows the number of observations in each combination of groups. It also usually includes row and column totals.

Scribbr editors not only correct grammar and spelling mistakes, but also strengthen your writing by making sure your paper is free of vague language, redundant words, and awkward phrasing.  ## Chi-square test of independence hypotheses

The chi-square test of independence is an inferential statistical test, meaning that it allows you to draw conclusions about a population based on a sample. Specifically, it allows you to conclude whether two variables are related in the population.

Like all hypothesis tests, the chi-square test of independence evaluates a null and alternative hypothesis. The hypotheses are two competing answers to the question “Are variable 1 and variable 2 related?”

• Null hypothesis (H0): Variable 1 and variable 2 are not related in the population; The proportions of variable 1 are the same for different values of variable 2.
• Alternative hypothesis (Ha): Variable 1 and  variable 2 are related in the population; The proportions of variable 1 are not the same for different values of  variable 2.

You can use the above sentences as templates. Replace variable 1 and variable 2 with the names of your variables.

### Expected values

A chi-square test of independence works by comparing the observed and the expected frequencies. The expected frequencies are such that the proportions of one variable are the same for all values of the other variable.

You can calculate the expected frequencies using the contingency table. The expected frequency for row r and column c is: ## When to use the chi-square test of independence

The following conditions are necessary if you want to perform a chi-square goodness of fit test:

1. You want to test a hypothesis about the relationship between two categorical variables (binary, nominal, or ordinal).
• Chi-square tests of independence are usually performed on binary or nominal variables. They are sometimes performed on ordinal variables, although generally only on ordinal variables with fewer than five groups.
2. The sample was randomly selected from the population.
3. There are a minimum of five observations expected in each combined group.

## How to calculate the test statistic (formula)

Pearson’s chi-square (Χ2) is the test statistic for the chi-square test of independence: Where

• Χ2 is the chi-square test statistic
• Σ is the summation operator (it means “take the sum of”)
• O is the observed frequency
• E is the expected frequency

The chi-square test statistic measures how much your observed frequencies differ from the frequencies you would expect if the two variables are unrelated. It is large when there’s a big difference between the observed and expected frequencies (OE in the equation).

Follow these five steps to calculate the test statistic:

### Step 1: Create a table

Create a table with the observed and expected frequencies in two columns.

### Step 2: Calculate O − E

In a new column called “O − E”, subtract the expected frequencies from the observed frequencies.

### Step 3: Calculate (O – E)2

In a new column called “(O − E)2”, square the values in the previous column.

### Step 4: Calculate (O − E)2 / E

In a final column called “(O − E)2 / E”, divide the previous column by the expected frequencies.

### Step 5: Calculate Χ2

Finally, add up the values of the previous column to calculate the chi-square test statistic (Χ2).

## How to perform the chi-square test of independence

If the test statistic is big enough then you should conclude that the observed frequencies are not what you’d expect if the variables are unrelated. But what counts as big enough?

We compare the test statistic to a critical value from a chi-square distribution to decide whether it’s big enough to reject the null hypothesis that the two variables are unrelated. This procedure is called the chi-square test of independence.

Follow these steps to perform a chi-square test of independence (the first two steps have already been completed for the recycling example):

### Step 1: Calculate the expected frequencies

Use the contingency table to calculate the expected frequencies following the formula: ### Step 2: Calculate chi-square

Use the Pearson’s chi-square formula to calculate the test statistic: ### Step 3: Find the critical chi-square value

You can find the critical value in a chi-square critical value table or using statistical software. You need to known two numbers to find the critical value:

• The degrees of freedom (df): For a chi-square test of independence, the df is (number of variable 1 groups − 1) * (number of variable 2 groups − 1).
• Significance level (α): By convention, the significance level is usually .05.

### Step 4: Compare the chi-square value to the critical value

Is the test statistic big enough to reject the null hypothesis? Compare it to the critical value to find out.

### Step 5: Decide whether to reject the null hypothesis

• If the Χ2 value is greater than the critical value, then the difference between the observed and expected distributions is statistically significant (pα).
• If the Χ2 value is less than the critical value, then the difference between the observed and expected distributions is not statistically significant (pα).
• The data doesn’t allow you to reject the null hypothesis that the variables are unrelated and doesn’t provide support for the alternative hypothesis that the variables are related.

### Step 6: Follow up with post hoc tests (optional)

If there are more than two groups in either of the variables and you rejected the null hypothesis, you may want to investigate further with post hoc tests. A post hoc test is a follow-up test that you perform after your initial analysis.

Similar to a one-way ANOVA with more than two groups, a significant difference doesn’t tell you which groups’ proportions are significantly different from each other.

One post hoc approach is to compare each pair of groups using chi-square tests of independence and a Bonferroni correction. A Bonferroni correction is when you divide your original significance level (usually .05) by the number of tests you’re performing.

## When to use a different test

Several tests are similar to the chi-square test of independence, so it may not always be obvious which to use. The best choice will depend on your variables, your sample size, and your hypotheses.

### When to use the chi-square goodness of fit test

There are two types of Pearson’s chi-square test. The chi-square test of independence is one of them, and the chi-square goodness of fit test is the other. The math is the same for both tests—the main difference is how you calculate the expected values.

You should use the chi-square goodness of fit test when you have one categorical variable and you want to test a hypothesis about its distribution.

### When to use Fisher’s exact test

If you have a small sample size (N < 100), Fisher’s exact test is a better choice. You should especially opt for Fisher’s exact test when your data doesn’t meet the condition of a minimum of five observations expected in each combined group.

### When to use McNemar’s test

You should use McNemar’s test when you have a closely-related pair of categorical variables that each have two groups. It allows you to test whether the proportions of the variables are equal. This test is most often used to compare before and after observations of the same individuals.

### When to use a G test

A G test and a chi-square test give approximately the same results. G tests can accommodate more complex experimental designs than chi-square tests. However, the tests are usually interchangeable and the choice is mostly a matter of personal preference.

One reason to prefer chi-square tests is that they’re more familiar to researchers in most fields.

## Practice questions

Do you want to test your knowledge about the chi-square goodness of fit test? Download our practice questions and examples with the buttons below.

## Other interesting articles

If you want to know more about statistics, methodology, or research bias, make sure to check out some of our other articles with explanations and examples.

How do I perform a chi-square test of independence in Excel?

You can use the CHISQ.TEST() function to perform a chi-square test of independence in Excel. It takes two arguments, CHISQ.TEST(observed_range, expected_range), and returns the p value.

How do I perform a chi-square test of independence in R?

You can use the chisq.test() function to perform a chi-square test of independence in R. Give the contingency table as a matrix for the “x” argument. For example:

m = matrix(data = c(89, 84, 86, 9, 8, 24), nrow = 3, ncol = 2)

chisq.test(x = m)

What properties does the chi-square distribution have?

A chi-square distribution is a continuous probability distribution. The shape of a chi-square distribution depends on its degrees of freedom, k. The mean of a chi-square distribution is equal to its degrees of freedom (k) and the variance is 2k. The range is 0 to ∞.

#### Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator. 