Assessment

Last updated

Adaptive Testing and Validity: Keeping Measurement Sound

Computer-adaptive assessment can shorten tests and sharpen precision, but it must be designed with validity in mind. This article outlines how item selection, calibration, and scoring interact so that adaptive instruments remain interpretable and fair for high-stakes use.

Introduction

Computer-adaptive testing (CAT) selects items based on the test-taker’s evolving ability estimate, which can reduce test length and improve measurement efficiency. Gains in efficiency must not come at the cost of validity. Construct representation, score interpretability, and fairness need to be built into the design from the start.

Item Selection and Calibration

Adaptive algorithms rely on a calibrated item bank. Items must be placed on a common scale via a suitable IRT model so that the selection algorithm can compare candidate items meaningfully. Poor calibration propagates into biased ability estimates and inconsistent test information. Regular linking studies and quality control of the bank are essential.

Scoring and Interpretability

CAT scores are typically reported on the same scale as the item bank. Transparency about the scoring model (e.g. maximum likelihood, expected a posteriori) and the meaning of the scale helps stakeholders interpret results. Where cut scores or bands are used, they should be justified with standard-setting procedures that account for the adaptive design.

Fairness in High-Stakes Use

High-stakes uses require evidence that scores are equitable across relevant subgroups. Differential item functioning (DIF) in the bank should be monitored and addressed. Exposure control and content constraints can reduce over-reliance on narrow item subsets and support the validity of score interpretations across administrations.

Conclusion

Adaptive testing offers real benefits when validity is prioritised alongside efficiency. Sound calibration, clear scoring, and attention to fairness and interpretability ensure that CAT-based assessments remain fit for purpose in high-stakes contexts.

References

  1. Wainer, H. (2000). Computerized Adaptive Testing: A Primer. Mahwah, NJ: Lawrence Erlbaum.
  2. van der Linden, W. J., & Glas, C. A. W. (Eds.). (2010). Elements of Adaptive Testing. New York: Springer.
  3. American Educational Research Association et al. (2014). Standards for Educational and Psychological Testing. Washington, DC: AERA.

← Back to Insights