Multiple Equating of Separate IRT Calibrations

Michela Battauz

doi:10.1007/s11336-016-9517-x

Multiple Equating of Separate IRT Calibrations

Published online by Cambridge University Press: 01 January 2025

Michela Battauz

Show author details

Michela Battauz*: Affiliation:
University of Udine
*: Correspondence should be made to Michela Battauz, Department of Economics and Statistics, University of Udine,Udine, Italy. Email: [email protected]; http://people.uniud.it/page/michela.battauz

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

When test forms are calibrated separately, item response theory parameters are not comparable because they are expressed on different measurement scales. The equating process includes the conversion of item parameter estimates on a common scale and the determination of comparable test scores. Various statistical methods have been proposed to perform equating between two test forms. This paper provides a generalization to multiple test forms of the mean-geometric mean, the mean-mean, the Haebara, and the Stocking–Lord methods. The proposed methods estimate simultaneously the equating coefficients that permit the scale transformation of the parameters of all forms to the scale of the base form. Asymptotic standard errors of the equating coefficients are derived. A simulation study is presented to illustrate the performance of the methods.

Keywords

equating coefficients Haebara item response theory linking mean-geometric mean mean-mean standard errors Stocking–Lord

Type: Original paper
Information: Psychometrika , Volume 82 , Issue 3 , September 2017 , pp. 610 - 636

DOI: https://doi.org/10.1007/s11336-016-9517-x [Opens in a new window]
Copyright: Copyright © 2016 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Baldwin, P.. (2013). On mean-sigma estimators and bias. British Journal of Mathematical and Statistical Psychology, 66, 277–289. doi:10.1111/j.2044-8317.2012.02048.x.CrossRef Google Scholar PubMed

Battauz, M.. (2013). IRT test equating in complex linkage plans. Psychometrika, 78, 464–480. doi:10.1007/s11336-012-9316-y.CrossRef Google Scholar PubMed

Battauz, M.. (2015). equateIRT: An R package for IRT test equating. Journal of Statistical Software, 68, 1–22. doi:10.18637/jss.v068.i07.CrossRef Google Scholar

Bock, R. D., & Aitkin, M.. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443–459. doi:10.1007/BF02293801.CrossRef Google Scholar

Deming, W. E., & Stephan, F. F.. (1940). On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. The Annals of Mathematical Statistics, 11, 427–444. doi:10.1214/aoms/1177731829.CrossRef Google Scholar

Goodman, L. A.. (1968). The analysis of cross-classified data: independence, quasi-independence and interactions in contingency tables with or without missing entries. Journal of the American Statistical Association, 63, 1091–1131.Google Scholar

Haberman, S. J. (2009). Linking parameter estimates derived from an item response model through separate calibrations. ETS Research Report Series, 2009, i-9. doi:10.1002/j.2333-8504.2009.tb02197.x.CrossRef Google Scholar

Haberman, S. J., Lee, Y. H. & Qian, J. (2009). Jackknifing techniques for evaluation of equating accuracy . ETS Research Report Series, 2009, i-37. doi:10.1002/j.2333-8504.2009.tb02196.x.CrossRef Google Scholar

Haebara, T.. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144–149. doi:10.4992/psycholres1954.22.144.CrossRef Google Scholar

Kim, S., Kolen, M. J.. (2007). Effects on scale linking of different definitions of criterion functions for the IRT characteristic curve methods. Journal of Educational and Behavioral Statistics, 32, 371–397. doi:10.3102/1076998607302632.CrossRef Google Scholar

Kolen, M. J., & Brennan, R. L., (2014). Test equating, scaling, and linking: methods and practices. 3New York: Springerdoi:10.1007/978-1-4939-0317-7.CrossRef Google Scholar

Lee, Y-H, Haberman, S. J.. (2013). Harmonic regression and scale stability. Psychometrika, 78, 815–829. doi:10.1007/s11336-013-9337-1.CrossRef Google Scholar PubMed

Loyd, B. H., & Hoover, H. D.. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17, 179–193. doi:10.1111/j.1745-3984.1980.tb00825.x.CrossRef Google Scholar

Marco, G. L.. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14, 139–160. doi:10.1111/j.1745-3984.1977.tb00033.x.CrossRef Google Scholar

Michaelides, M. P., & Haertel, E. H.. (2014). Selection of common items as an unrecognized source of variability in test equating: A bootstrap approximation assuming random sampling of common items. Applied Measurement in Education, 27, 46–57. doi:10.1080/08957347.2013.853069.CrossRef Google Scholar

Mislevy, R. J. & Bock, R. D. (1990). BILOG 3. Item analysis and test scoring with binary logistic models. Mooresville, IN: Scientific Software..Google Scholar

Ogasawara, H.. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review (Otaru University of Commerce), 51, 1–23.Google Scholar

Ogasawara, H.. (2001). Item response theory true score equatings and their standard errors. Journal of Educational and Behavioral Statistics, 26, 31–50. doi:10.3102/10769986026001031.CrossRef Google Scholar

Ogasawara, H.. (2001). Standard errors of item response theory equating/linking by response function methods. Applied Psychological Measurement, 25, 53–67. doi:10.1177/01466216010251004.CrossRef Google Scholar

Ogasawara, H.. (2003). Asymptotic standard errors of IRT observed-score equating methods. Psychometrika, 68, 193–211. doi:10.1007/BF02294797.CrossRef Google Scholar

R Development Core Team. (2016). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing..Google Scholar

Rizopoulos, D.. (2006). ltm: An R package for latent variable modelling and item response theory analyses. Journal of Statistical Software, 17, 1–25. doi:10.18637/jss.v017.i05.CrossRef Google Scholar

Stocking, M., Lord, M. L.. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201–210. doi:10.1177/014662168300700208.CrossRef Google Scholar

van der Linden, W. J., & Hambleton, R. K., (1997). Handbook of modern item response theory. New York: Springerdoi:10.1007/978-1-4757-2691-6.CrossRef Google Scholar

Article contents

Multiple Equating of Separate IRT Calibrations

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests