Hostname: page-component-cd9895bd7-8ctnn Total loading time: 0 Render date: 2025-01-05T22:09:23.539Z Has data issue: false hasContentIssue false

A Test-Theoretic Approach to Observed-Score Equating

Published online by Cambridge University Press:  01 January 2025

Wim J. van der Linden*
Affiliation:
University of Twente
*
Requests for reprints should be sent to W.J. van der Linden, Department of Educational Measurement and Data Analysis, University of Twente, P.O. Box 217, 7500 AE Enschede, THE NETHERLANDS. E-Mail: [email protected]

Abstract

Observed-score equating using the marginal distributions of two tests is not necessarily the universally best approach it has been claimed to be. On the other hand, equating using the conditional distributions given the ability level of the examinee is theoretically ideal. Possible ways of dealing with the requirement of known ability are discussed, including such methods as conditional observed-score equating at point estimates or posterior expected conditional equating. The methods are generalized to the problem of observed-score equating with a multivariate ability structure underlying the scores.

Type
Original Paper
Copyright
Copyright © 2000 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This article is based on the author's Presidential Address given on July 7, 2000 at the 65th Annual Meeting of the Psychometric Society held at the University of British Columbia, Vancouver, Canada.

The author is most indebted to Wim M.M. Tielen for his computational assistance and Cees A.W. Glas for his comments on a draft of this paper.

References

Braun, H.I., Holland, P.W. (1982). Observed score test equating: A mathematical analysis of some ETS equating procedures. In Holland, P. W., Rubin, D. B. (Eds.), Test equating (pp. 949). New York: Academic PressGoogle Scholar
Campbell, N. R. (1928). An account of the principles of measurement and calculation. London: Longmans, Green & Co.Google Scholar
Cizek, G.J., Kenney, P.A., Kolen, M.J., Peters, C.W., van der Linden, W.J. (1999). The feasibility of linking scores on the proposed Voluntary National Test and the National Assessment of Educational Progress [Final report]. Washington, DC: National Assessment Governing BoardGoogle Scholar
Dorans, N.J. (1999). Correspondences between ACT and SAT I scores. New York: College Entrance BoardCrossRefGoogle Scholar
Dubois, P.H. (1970). A history of psychological testing. Boston: Allyn & BaconGoogle Scholar
Feuer, M.J., Holland, P.W., Green, B.F., Bertenthal, M. W., Hemphill, F. C. (1999). Uncommon measures: Equivalence and linkage among educational tests. Washington, DC: National Academy PressGoogle Scholar
Glas, C.A.W. (1992). A Rasch model with a multivariate distribution of ability. In Wilson, M. (Eds.), Objective measurement: Theory into practice (pp. 236260). Norwood, NJ: AblexGoogle Scholar
Grayson, D.A. (1988). Two-group classification in latent trait theory: Scores with monotone likelihood ratio. Psychometrika, 53, 383392CrossRefGoogle Scholar
Harris, D.B., Crouse, J.D. (1993). A study of criteria used in equating. Applied Measurement in Education, 6, 195240CrossRefGoogle Scholar
Holland, P.W., Rubin, D.B. (1982). Test equating. New York: Academic PressGoogle Scholar
Junker, B.W., Sijtsma, K. (2000). Latent and manifest monotonicity in item response models. Applied Psychological Measurement, 24, 6581CrossRefGoogle Scholar
Kolen, M.J., Brennan, R.L. (1995). Test equating: Methods and practices. New York, NY: Springer-VerlagCrossRefGoogle Scholar
Koretz, D.M., Bertenthal, M.W., Green, B.F. (1999). Embedded questions: The pursuit of a common measure in uncommon tests. Washington, DC: National Academy PressGoogle Scholar
Lehmann, E.L. (1986). Testing statistical hypothesis 2nd ed., New York, NY: Wiley & SonsCrossRefGoogle Scholar
Linn, R.L. (1993). Linking results of distincts assessments. Applied Measurement in Education, 6, 83102Google Scholar
Liou, M., Cheng, P.E. (1995). Asymptotic standard error of equipercentile equating. Journal of Educational and Behavioral Statistics, 20, 119136CrossRefGoogle Scholar
Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: ErlbaumGoogle Scholar
Lord, F.M. (1982). The standard error of equipercentile equating. Journal of Educational Statistics, 7, 165174CrossRefGoogle Scholar
Lord, F.M., Wingersky, M.S. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings”. Applied Psychological Measurement, 8, 452461CrossRefGoogle Scholar
Mislevy, R.J. (1992). Linking educational assessments: Concepts, issues, methods, and prospects. Princeton, NJ: Educational Testing ServiceGoogle Scholar
Morris, C.N. (1982). On the foundations of test equating. In Holland, P.W., Rubin, D.B. (Eds.), Test equating (pp. 169191). New York, NY: Academic PressGoogle Scholar
Pashley, P.J., Philips, G.W. (1993). Towards world-class standards: A research study linking international and national assessments. Princeton, NJ: Educational Testing Service, Center for Educational ProgressGoogle Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational ResearchGoogle Scholar
Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72101CrossRefGoogle Scholar
Suppes, P., Zinnes, J.L. (1963). Basic measurement theory. In Luce, R.D., Bush, R.R., Galanter, E. (Eds.), Handbook of mathematical psychology (pp. 176). New York, NY: Wiley & SonsGoogle Scholar
van der Linden, W. J. (1996). Assembling tests for the measurement of multiple abilities. Applied Psychological Measurement, 20, 373388CrossRefGoogle Scholar
van der Linden, W.J. (1998). Stochastic order in dichotomous iem response models for fixed, adaptive, and multidimensional tests. Psychometrika, 63, 211226CrossRefGoogle Scholar
van der Linden, W.J. (1998). Optimal assembly of psychological and educational tests. Applied Psychological Measurement, 22, 195211CrossRefGoogle Scholar
van der Linden, W.J. (in press). Adaptive testing with equated number-correct scoring. Applied Psychological Measurement, 25.Google Scholar
van der Linden, W.J., Luecht, R.M. (1998). Observed-equating as a test assembly problem. Psychometrika, 63, 401418CrossRefGoogle Scholar
van der Linden, W.J., Vos, J.H. (1996). A compensatory approach to optimal selection with mastery scores. Psychometrika, 61, 155172CrossRefGoogle Scholar
Wilk, M. B., Gnanadesikan, R. (1968). Probability plotting methods for the analysis of data. Biometrika, 55, 117Google ScholarPubMed
Williams, V., Billaud, L., Davis, D., Thissen, D., Sanford, E. (1995). Projecting the NAEP scale: Results from the North Carolina end—of-grade testing program. Chapel Hill, NC: University of North Carolina, Chapel Hill, National Institute of Statistical SciencesGoogle Scholar
Yen, W. (1983). Tau-equivalence and equipercentile equating. Psychometrika, 48, 353369CrossRefGoogle Scholar
Zeng, L., Kolen, M.J. (1995). An alternative approach for IRT observed-score equating of number-correct scores. Applied Psychological Measurement, 19, 231240CrossRefGoogle Scholar