What Is Criterion Validity? | Definition & Examples
Criterion validity (or criterion-related validity) evaluates how accurately a test measures the outcome it was designed to measure. An outcome can be a disease, behavior, or performance. Concurrent validity measures tests and criterion variables in the present, while predictive validity measures those in the future.
To establish criterion validity, you need to compare your test results to criterion variables. Criterion variables are often referred to as a “gold standard” measurement. They comprise other tests that are widely accepted as valid measures of a construct.
The researcher can then compare the college entry exam scores of 100 students to their GPA after one semester in college. If the scores of the two tests are close, then the college entry exam has criterion validity.
When your test agrees with the criterion variable, it has high criterion validity. However, criterion variables can be difficult to find.
What is criterion validity?
Criterion validity shows you how well a test correlates with an established standard of comparison called a criterion.
A measurement instrument, like a questionnaire, has criterion validity if its results converge with those of some other, accepted instrument, commonly called a “gold standard.”
A gold standard (or criterion variable) measures:
- The same construct
- Conceptually relevant constructs
- Conceptually relevant behavior or performance
When a gold standard exists, evaluating criterion validity is a straightforward process. For example, you can compare a new questionnaire with an established one. In medical research, you can compare test scores with clinical assessments.
However, in many cases, there is no existing gold standard. If you want to measure pain, for example, there is no objective standard to do so. You must rely on what respondents tell you. In such cases, you can’t achieve criterion validity.
It’s important to keep in mind that criterion validity is only as good as the validity of the gold standard or reference measure. If the reference measure is biased, it can impact an otherwise valid measure. In other words, a valid measure tested against a biased gold standard may fail to achieve criterion validity.
Similarly, two biased measures will confirm one another. Thus, criterion validity is no guarantee that a measure is in fact valid. It’s best used in tandem with the other types of validity.
Types of criterion validity
There are two types of criterion validity. Which type you use depends on the time at which the two measures (the criterion and your test) are obtained.
- Concurrent validity is used when the scores of a test and the criterion variables are obtained at the same time.
- Predictive validity is used when the criterion variables are measured after the scores of the test.
Concurrent validity is demonstrated when a new test correlates with another test that is already considered valid, called the criterion test. A high correlation between the new test and the criterion indicates concurrent validity.
Establishing concurrent validity is particularly important when a new measure is created that claims to be better in some way than its predecessors: more objective, faster, cheaper, etc.
Remember that this form of validity can only be used if another criterion or validated instrument already exists.
Predictive validity is demonstrated when a test can predict future performance. In other words, the test must correlate with a variable that can only be assessed at some point in the future, after the test has been administered.
For predictive criterion validity, researchers often examine how the results of a test predict a relevant future outcome. For example, the results of an IQ test can be used to predict future educational achievement. The outcome is, by design, assessed at some point in the future.
A high correlation provides evidence of predictive validity. It indicates that a test can correctly predict something that you hypothesize it should.
Criterion validity example
Criterion validity is often used when a researcher wishes to replace an established test with a different version of the same test, particularly one that is more objective, shorter, or cheaper.
How to measure criterion validity
Criterion validity is assessed in two ways:
- By statistically testing a new measurement technique against an independent criterion or standard to establish concurrent validity
- By statistically testing against a future performance to establish predictive validity
The measure to be validated, such as a test, should be correlated with a measure considered to be a well-established indication of the construct under study. This is your criterion variable.
Correlations between the scores on the test and the criterion variable are calculated using a correlation coefficient, such as Pearson’s r. A correlation coefficient expresses the strength of the relationship between two variables in a single value between −1 and +1.
Correlation coefficient values can be interpreted as follows:
- r = 1: There is perfect positive correlation
- r = 0: There is no correlation at all.
- r = −1: There is perfect negative correlation
Positive correlation between a test and the criterion variable shows that the test is valid. No correlation or a negative correlation indicates that the test and criterion variable do not measure the same concept.
Frequently asked questions about criterion validity
- What is the difference between criterion validity and construct validity?
While construct validity is the degree to which a test or other measurement method measures what it claims to measure, criterion validity is the degree to which a test can predictively (in the future) or concurrently (in the present) measure something.
- Why does construct validity matter?
When designing or evaluating a measure, construct validity helps you ensure you’re actually measuring the construct you’re interested in. If you don’t have construct validity, you may inadvertently measure unrelated or distinct constructs and lose precision in your research.
Construct validity is often considered the overarching type of measurement validity, because it covers all of the other types. You need to have face validity, content validity, and criterion validity to achieve construct validity.
- What’s the difference between reliability and validity?
Reliability and validity are both about how well a method measures something:
- Reliability refers to the consistency of a measure (whether the results can be reproduced under the same conditions).
- Validity refers to the accuracy of a measure (whether the results really do represent what they are supposed to measure).
If you are doing experimental research, you also have to consider the internal and external validity of your experiment.
- Why is face validity important?
Face validity is important because it’s a simple first step to measuring the overall validity of a test or technique. It’s a relatively intuitive, quick, and easy way to start checking whether a new measure seems useful at first glance.
Good face validity means that anyone who reviews your measure says that it seems to be measuring what it’s supposed to. With poor face validity, someone reviewing your measure may be left confused about what you’re measuring and why you’re using this method.