Lord–Wingersky Algorithm Version 2.0 for Hierarchical Item Factor Models with Applications in Test Scoring, Scale Alignment, and Model Fit Testing

Li Cai

doi:10.1007/s11336-014-9411-3

Lord–Wingersky Algorithm Version 2.0 for Hierarchical Item Factor Models with Applications in Test Scoring, Scale Alignment, and Model Fit Testing

Published online by Cambridge University Press: 01 January 2025

Li Cai

Show author details

Li Cai*: Affiliation:
University of California
*: Correspondence should be sent to Li Cai, CRESST, University of California, Los Angeles, CA 90095-1521, USA. E-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Lord and Wingersky’s (Appl Psychol Meas 8:453–461, 1984) recursive algorithm for creating summed score based likelihoods and posteriors has a proven track record in unidimensional item response theory (IRT) applications. Extending the recursive algorithm to handle multidimensionality is relatively simple, especially with fixed quadrature because the recursions can be defined on a grid formed by direct products of quadrature points. However, the increase in computational burden remains exponential in the number of dimensions, making the implementation of the recursive algorithm cumbersome for truly high-dimensional models. In this paper, a dimension reduction method that is specific to the Lord–Wingersky recursions is developed. This method can take advantage of the restrictions implied by hierarchical item factor models, e.g., the bifactor model, the testlet model, or the two-tier model, such that a version of the Lord–Wingersky recursive algorithm can operate on a dramatically reduced set of quadrature points. For instance, in a bifactor model, the dimension of integration is always equal to 2, regardless of the number of factors. The new algorithm not only provides an effective mechanism to produce summed score to IRT scaled score translation tables properly adjusted for residual dependence, but leads to new applications in test scoring, linking, and model fit checking as well. Simulated and empirical examples are used to illustrate the new applications.

Keywords

multidimensional item response theory bifactor model testlet linking scale alignment test equating goodness-of-fit testing summed score

Type: Original Paper
Information: Psychometrika , Volume 80 , Issue 2 , June 2015 , pp. 535 - 559

DOI: https://doi.org/10.1007/s11336-014-9411-3 [Opens in a new window]
Copyright: Copyright © 2014 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bock, R.D., Gibbons, R., & Muraki, E. (1988). Full-information item factor analysis. Applied Psychological Measurement, 12, 261–280.CrossRef Google Scholar

Cai, L. (2010). High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika, 75, 33–57.CrossRef Google Scholar

Cai, L. (2010). A two-tier full-information item factor analysis model with applications. Psychometrika, 75, 581–612.CrossRef Google Scholar

Cai, L. (2013). flexMIRT Version 2.0: Flexible multilevel item analysis and test scoring (Computer software). Chapel Hill, NC: Vector Psychometric Group LLC.Google Scholar

Cai, L., Thissen, D., & du Toit, S.H.C. (2011). IRTPRO: Flexible, multidimensional, multiple categorical IRT modeling (Computer software). Chicago, IL: Scientific Software International.Google Scholar

Cai, L., Yang, J.S., & Hansen, M. (2011). Generalized full-information item bifactor analysis. Psychological Methods, 16, 221–248.CrossRef Google Scholar PubMed

Chen, W.H., & Thissen, D. (1999). Estimation of item parameters for the three-parameter logistic model using the marginal likelihood of summed scores. British Journal of Mathematical and Statistical Psychology, 52, 19–37.CrossRef Google Scholar

Edwards, M.C. (2010). A Markov chain Monte Carlo approach to confirmatory item factor analysis. Psychometrika, 75, 474–497.CrossRef Google Scholar

Ferrando, P.J., & Lorenzo-seva, U. (2001). Checking the appropriateness of item response theory models by predicting the distribution of observed scores: The program EO-fit. Educational and Psychological Measurement, 61, 895–902.CrossRef Google Scholar

Gibbons, R.D., & Hedeker, D. (1992). Full-information item bifactor analysis. Psychometrika, 57, 423–436.CrossRef Google Scholar

Gibbons, R.D., Bock, R.D., Hedeker, D., Weiss, D.J., Segawa, E., & Bhaumik, D.K. et al. (2007). Full-information item bifactor analysis of graded response data. Applied Psychological Measurement, 31, 4–19.CrossRef Google Scholar

Glas, C.A.W., Wainer, H., & Bradlow, E.T. (2000). Maximum marginal likelihood and expected a posteriori estimation in testlet-based adaptive testing. In van der Linden, W.J., & Glas, C.A.W. (Eds.), Computerized adaptive testing: Theory and practice (pp. 271–288). Boston, MA: Kluwer.CrossRef Google Scholar

Hambleton, R.K., & Traub, R.E. (1973). Analysis of empirical data using two logistic latent trait models. British Journal of Mathematical and Statistical Psychology, 26, 195–211.CrossRef Google Scholar

Holzinger, K.J., & Swineford, F. (1937). The bi-factor method. Psychometrika, 2, 41–54.CrossRef Google Scholar

Ip, E.H. (2010). Empirically indistinguishable multidimensional IRT and locally dependent unidimensional item response models. British Journal of Mathematical and Statistical Psychology, 63, 395–416.CrossRef Google Scholar PubMed

Ip, E.H. (2010). Interpretation of the three-parameter testlet response model and information function. Applied Psychological Measurement, 34, 467–482.CrossRef Google Scholar

Jeon, M., Rijmen, F., & Rabe-Hesketh, S. (2013). Modeling differential item functioning using a generalization of the multiple-group bifactor model. Journal of Educational and Behavioral Statistics, 38, 32–60.CrossRef Google Scholar

Li, Y., & Rupp, A.A. Performance of the

S - X^{2}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S-X^{2}$$\end{document}

statistic for full-information bifactor models (2011). Educational and Psychological Measurement. 71, 986–1005.CrossRef Google Scholar

Li, Y., Bolt, D.M., & Fu, J. (2006). A comparison of alternative models for testlets. Applied Psychological Measurement, 30, 3–21.CrossRef Google Scholar

Li, Z. & Cai, L. (2012). Summed score based fit indices for testing latent variable distribution assumption in IRT. Paper presented at the 2012 International Meeting of the Psychometric Society, Lincoln, NE.Google Scholar

Lord, F.M. (1953). The relation of test score to the latent trait underlying the test. Educational and Psychological Measurement, 13, 517–548.CrossRef Google Scholar

Lord, F.M., & Wingersky, M.S. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings”. Applied Psychological Measurement, 8, 453–461.CrossRef Google Scholar

Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176.CrossRef Google Scholar

Orlando, M., & Thissen, D. (2000). New item fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24, 50–64.CrossRef Google Scholar

Orlando, M., Sherbourne, C.D., & Thissen, D. (2000). Summed-score linking using item response theory: Application to depression measurement. Psychological Assessment, 12, 354–359.CrossRef Google Scholar PubMed

Reckase, M.D. (2009). Multidimentional item response theory. New York, NY: Springer.CrossRef Google Scholar

Reeve, B.B., Hays, R.D., Bjorner, J.B., Cook, K.F., Crane, P.K., & Teresi, J.A. et al. (2007). Psychometric evaluation and calibration of health-related quality of life items banks: Plans for the patient-reported outcome measurement information system (PROMIS). Medical Care, 45, 22–31.CrossRef Google Scholar

Reise, S.P. (2012). The rediscovery of bifactor measurement models. Multivariate Behavioral Research, 47, 667–696.CrossRef Google Scholar PubMed

Rijmen, F. (2009). Efficient full information maximum likelihood estimation for multidimensional IRT models (Tech. Rep. No. RR-09-03). Educational Testing Service.Google Scholar

Rijmen, F. (2010). Formal relations and an empirical comparison between the bi-factor, the testlet, and a second-order multidimensional IRT model. Journal of Educational Measurement, 47, 361–372.CrossRef Google Scholar

Rijmen, F., Vansteelandt, K., & De Boeck, P. (2008). Latent class models for diary method data: Parameter estimation by local computations. Psychometrika, 73, 167–182.CrossRef Google Scholar PubMed

Rosa, K., Swygert, K.A., Nelson, L., & Thissen, D. (2001). Item response theory applied to combinations of multiple-choice and constructed-response items—scale scores for patterns of summed scores. In Thissen, D., & Wainer, H. (Eds.), Test scoring (pp. 253–292). Mahwah, NJ: Lawrence Erlbaum.Google Scholar

Ross, J. (1966). An empirical study of a logistic mental test model. Psychometrika, 31, 325–340.CrossRef Google Scholar PubMed

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores (Psychometric Monographs No. 17). Richmond, VA: Psychometric Society.Google Scholar

Schilling, S., & Bock, R.D. (2005). High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika, 70, 533–555.CrossRef Google Scholar

Schmid, J., & Leiman, J.M. (1957). The development of hierarchical factor solutions. Psychometrika, 22, 53–61.CrossRef Google Scholar

Sinharay, S., Johnson, M.S., & Stern, H.S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30, 298–321.CrossRef Google Scholar

Stucky, B.D., Thissen, D., & Edelen, M.O. (2013). Using logistic approximations of marginal trace lines to develop short assessments. Applied Psychological Measurement, 37, 41–57.CrossRef Google Scholar

Thissen, D., & Wainer, H. (Eds.) (2001). Test scoring. Mahwah, NJ: Lawrence Erlbaum.CrossRef Google Scholar

Thissen, D., Pommerich, M., Billeaud, K., & Williams, V.S.L. (1995). Item response theory for scores on tests including polytomous items with ordered responses. Applied Psychological Measurement, 19, 39–49.CrossRef Google Scholar

Thissen, D., Varni, J.W., Stucky, B.D., Liu, Y., Irwin, D.E., & DeWalt, D.A. (2011). Using the PedsQL™ 3.0 asthma module to obtain scores comparable with those of the PROMIS pediatric asthma impact scale (PAIS). Quality of Life Research, 20, 1497–1505.CrossRef Google Scholar PubMed

Wainer, H., Bradlow, E.T., & Wang, X. (2007). Testlet response theory and its applications. New York, NY: Cambridge University Press.CrossRef Google Scholar

Wirth, R.J., & Edwards, M.C. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12, 58–79.CrossRef Google Scholar PubMed

Wu, E.J.C., & Bentler, P.M. (2011). EQSIRT: A user-friendly IRT program (Computer software). Encino, CA: Multivariate Software.Google Scholar

Yung, Y.F., McLeod, L.D., & Thissen, D. (1999). On the relationship between the higher-order factor model and the hierarchical factor model. Psychometrika, 64, 113–128.CrossRef Google Scholar

Article contents

Lord–Wingersky Algorithm Version 2.0 for Hierarchical Item Factor Models with Applications in Test Scoring, Scale Alignment, and Model Fit Testing

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests