What Is Criterion Validity? | Definition & Examples

Criterion validity (or criterion-related validity) evaluates how accurately a test measures the outcome it was designed to measure. An outcome can be a disease, behavior, or performance. Concurrent validity measures tests and criterion variables in the present, while predictive validity measures those in the future.

To establish criterion validity, you need to compare your test results to criterion variables. Criterion variables are often referred to as a “gold standard” measurement. They comprise other tests that are widely accepted as valid measures of a construct.

Example: Criterion validity
A researcher wants to know whether a college entrance exam is able to predict future academic performance. First-semester GPA can serve as the criterion variable, as it is an accepted measure of academic performance.

The researcher can then compare the college entry exam scores of 100 students to their GPA after one semester in college. If the scores of the two tests are close, then the college entry exam has criterion validity.

When your test agrees with the criterion variable, it has high criterion validity. However, criterion variables can be difficult to find.

What is criterion validity?

Criterion validity shows you how well a test correlates with an established standard of comparison called a criterion.

A measurement instrument, like a questionnaire, has criterion validity if its results converge with those of some other, accepted instrument, commonly called a “gold standard.”

A gold standard (or criterion variable) measures:

  • The same construct
  • Conceptually relevant constructs
  • Conceptually relevant behavior or performance

When a gold standard exists, evaluating criterion validity is a straightforward process. For example, you can compare a new questionnaire with an established one. In medical research, you can compare test scores with clinical assessments.

However, in many cases, there is no existing gold standard. If you want to measure pain, for example, there is no objective standard to do so. You must rely on what respondents tell you. In such cases, you can’t achieve criterion validity.

It’s important to keep in mind that criterion validity is only as good as the validity of the gold standard or reference measure. If the reference measure is biased, it can impact an otherwise valid measure. In other words, a valid measure tested against a biased gold standard may fail to achieve criterion validity.

Similarly, two biased measures will confirm one another. Thus, criterion validity is no guarantee that a measure is in fact valid. It’s best used in tandem with the other types of validity.

Types of criterion validity

There are two types of criterion validity. Which type you use depends on the time at which the two measures (the criterion and your test) are obtained.

  • Concurrent validity is used when the scores of a test and the criterion variables are obtained at the same time.
  • Predictive validity is used when the criterion variables are measured after the scores of the test.

Concurrent validity

Concurrent validity is demonstrated when a new test correlates with another test that is already considered valid, called the criterion test. A high correlation between the new test and the criterion indicates concurrent validity.

Establishing concurrent validity is particularly important when a new measure is created that claims to be better in some way than its predecessors: more objective, faster, cheaper, etc.

Example: Concurrent validity
A psychologist wants to evaluate a self-report test on body image dissatisfaction. The concurrent validity of the test can be assessed by comparing the scores of the test with a clinical diagnosis that was made at the same time.

Remember that this form of validity can only be used if another criterion or validated instrument already exists.

Predictive validity

Predictive validity is demonstrated when a test can predict future performance. In other words, the test must correlate with a variable that can only be assessed at some point in the future, after the test has been administered.

For predictive criterion validity, researchers often examine how the results of a test predict a relevant future outcome. For example, the results of an IQ test can be used to predict future educational achievement. The outcome is, by design, assessed at some point in the future.

Example: Predictive validity
Suppose you want to find out whether a college entrance math test can predict a student’s future performance in an engineering study program.

A student’s GPA is a widely accepted marker of academic performance and can be used as a criterion variable. To assess the predictive validity of the math test, you compare how students scored in that test to their GPA after the first semester in the engineering program. If high test scores were associated with individuals who later performed well in their studies and achieved a high GPA, then the math test would have strong predictive validity.

A high correlation provides evidence of predictive validity. It indicates that a test can correctly predict something that you hypothesize it should.

What can proofreading do for your paper?

Scribbr editors not only correct grammar and spelling mistakes, but also strengthen your writing by making sure your paper is free of vague language, redundant words and awkward phrasing.

See editing example

Criterion validity example

Criterion validity is often used when a researcher wishes to replace an established test with a different version of the same test, particularly one that is more objective, shorter, or cheaper.

Example: Criterion validity
A school psychologist creates a shorter form of an existing survey to assess procrastination among students.

Although the original test is widely accepted as a valid measure of procrastination, it is very long and takes a lot of time to complete. As a result, many students fill it in without carefully considering their answers.

To evaluate how well the new, shorter test assesses procrastination, the psychologist asks the same group of students to take both the new and the original test. If the results between the two tests are similar, the new test has high criterion validity. The psychologist can be confident that the new test will measure procrastination as accurately as the original.

How to measure criterion validity

Criterion validity is assessed in two ways:

  • By statistically testing a new measurement technique against an independent criterion or standard to establish concurrent validity
  • By statistically testing against a future performance to establish predictive validity

The measure to be validated, such as a test, should be correlated with a measure considered to be a well-established indication of the construct under study. This is your criterion variable.

Correlations between the scores on the test and the criterion variable are calculated using a correlation coefficient, such as Pearson’s r. A correlation coefficient expresses the strength of the relationship between two variables in a single value between −1 and +1.

Correlation coefficient values can be interpreted as follows:

  • r = 1: There is perfect positive correlation
  • r = 0: There is no correlation at all.
  • r = −1: There is perfect negative correlation

You can automatically calculate Pearson’s r in Excel, R, SPSS or other statistical software.

Positive correlation between a test and the criterion variable shows that the test is valid. No correlation or a negative correlation indicates that the test and criterion variable do not measure the same concept.

Example: Measuring criterion validity
Suppose you are interested in developing your own scale measuring self-esteem. To establish criterion validity, you need to compare it to a criterion variable.

You give the two scales to the same sample of respondents. The extent of agreement between the results of the two scales is expressed through a correlation coefficient.

You calculate the correlation coefficient between the results of the two tests and find out that your scale correlates with the existing scale (r = 0.80). This value shows that there is a strong positive correlation between the two scales.

In other words, your scale is accurately measuring the same construct operationalized in the validated scale.

Frequently asked questions about criterion validity

What is the difference between criterion validity and construct validity?

Criterion validity and construct validity are both types of measurement validity. In other words, they both show you how accurately a method measures something.

While construct validity is the degree to which a test or other measurement method measures what it claims to measure, criterion validity is the degree to which a test can predictively (in the future) or concurrently (in the present) measure something.

Construct validity is often considered the overarching type of measurement validity. You need to have face validity, content validity, and criterion validity in order to achieve construct validity.

Why does construct validity matter?

When designing or evaluating a measure, construct validity helps you ensure you’re actually measuring the construct you’re interested in. If you don’t have construct validity, you may inadvertently measure unrelated or distinct constructs and lose precision in your research.

Construct validity is often considered the overarching type of measurement validity,  because it covers all of the other types. You need to have face validity, content validity, and criterion validity to achieve construct validity.

What’s the difference between reliability and validity?

Reliability and validity are both about how well a method measures something:

  • Reliability refers to the consistency of a measure (whether the results can be reproduced under the same conditions).
  • Validity refers to the accuracy of a measure (whether the results really do represent what they are supposed to measure).

If you are doing experimental research, you also have to consider the internal and external validity of your experiment.

Why is face validity important?

Face validity is important because it’s a simple first step to measuring the overall validity of a test or technique. It’s a relatively intuitive, quick, and easy way to start checking whether a new measure seems useful at first glance.

Good face validity means that anyone who reviews your measure says that it seems to be measuring what it’s supposed to. With poor face validity, someone reviewing your measure may be left confused about what you’re measuring and why you’re using this method.

Is this article helpful?
Kassiani Nikolopoulou

Kassiani has an academic background in Communication, Bioeconomy and Circular Economy. As a former journalist she enjoys turning complex scientific information into easily accessible articles to help students.