Hostname: page-component-745bb68f8f-s22k5 Total loading time: 0 Render date: 2025-01-07T13:37:53.984Z Has data issue: false hasContentIssue false

MCMC Estimation and Some Model-Fit Analysis of Multidimensional IRT Models

Published online by Cambridge University Press:  01 January 2025

A. A. Béguin*
Affiliation:
University of Twente
C. A. W. Glas
Affiliation:
University of Twente
*
Requests for reprints should be sent to Anton Béguin, CITO group, P.O. Box 1034, 6801 MG Arnhem, THE NETHERLANDS.

Abstract

A Bayesian procedure to estimate the three-parameter normal ogive model and a generalization of the procedure to a model with multidimensional ability parameters are presented. The procedure is a generalization of a procedure by Albert (1992) for estimating the two-parameter normal ogive model. The procedure supports analyzing data from multiple populations and incomplete designs. It is shown that restrictions can be imposed on the factor matrix for testing specific hypotheses about the ability structure. The technique is illustrated using simulated and real data.

Type
Articles
Copyright
Copyright © 2001 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

The authors would like to thank Norman Verhelst for his valuable comments and ACT, CITO group and SweSAT for the use of their data.

References

Ackerman, T.A. (2001). Developments in multidimensional item response theory. Applied Psychological Measurement, 20, 309310.CrossRefGoogle Scholar
Ackerman, T.A. (2001). Graphical representation of multidimensional item response theory analyses. Applied Psychological Measurement, 20, 311329.CrossRefGoogle Scholar
ACT (1997). ACT Assessment Technical Manual. Iowa City, IA: Author.Google Scholar
Albert, J.H. (1992). Bayesian estimation of normal ogive item response functions using Gibbs sampling. Journal of Educational Statistics, 17, 251269.CrossRefGoogle Scholar
Andersen, E.B. (1973). A goodness of for test for the Rasch model. Psychometrika, 38, 123140.CrossRefGoogle Scholar
Baker, F.B. (1998). An investigation of item parameter recovery characteristics of a Gibbs sampling procedure. Applied Psychological Measurement, 22, 153169.CrossRefGoogle Scholar
Bock, R.D., Gibbons, R.D., & Muraki, E. (1988). Full-information factor analysis. Applied Psychological Measurement, 12, 261280.CrossRefGoogle Scholar
Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: An application of an EM-algorithm. Psychometrika, 46, 443459.CrossRefGoogle Scholar
Bock, R.D., & Schilling, S.G. (1997). High dimensional full-information item factor analysis. In Berkane, M. (Eds.), Latent variable modeling and applications of causality (pp. 163176). New York, NY: Springer.CrossRefGoogle Scholar
Bock, R.D., & Zimowski, M.F. (1997). Multiple group IRT. In van der Linden, W.J., & Hambleton, R.K. (Eds.), Handbook of modern item response theory (pp. 433448). New York, NY: Springer.CrossRefGoogle Scholar
Box, G., & Tiao, G. (1973). Bayesian inference in statistical analysis. Reading, MA: Addison-Wesley.Google Scholar
Bradlow, E.T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153168.CrossRefGoogle Scholar
Cressie, N., & Holland, P.W. (1983). Characterizing the manifest probabilities of latent trait models. Psychometrika, 48, 129141.CrossRefGoogle Scholar
Fischer, G.H. (1995). Derivations of the Rasch model. In Fischer, G.H., & Molenaar, I.W. (Eds.), Rasch models: Foundations, recent developments and applications (pp. 1538). New York, NY: Springer.CrossRefGoogle Scholar
Fox, J.P., & Glas, C.A.W. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66, 271288.CrossRefGoogle Scholar
Fraser, C. (1988). NOHARM: A computer program for fitting both unidimensional and multidimensional normal ogive models of latent trait theory. Armidale, Australia: University of New England.Google Scholar
Gelfand, A.E., & Smith, A.F.M. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85, 398409.CrossRefGoogle Scholar
Gelman, A., Carlin, J.B., Stern, H.S., & Rubin, D.B. (1995). Bayesian data analysis. London: Chapman and Hall.CrossRefGoogle Scholar
Glas, C.A.W. (1988). The derivation of some tests for the Rasch model from the multinomial distribution. Psychometrika, 53, 525546.CrossRefGoogle Scholar
Glas, C.A.W. (1998). Detection of differential item functioning using Lagrange multiplier tests. Statistica Sinica, 8(1), 647667.Google Scholar
Glas, C.A.W. (1999). Modification indices for the 2-pl and the nominal response model. Psychometrika, 64, 273294.CrossRefGoogle Scholar
Glas, C.A.W., & Ellis, J.L. (1993). RSP, Rasch scaling program, computer program and user's manual. Groningen: ProGAMMA.Google Scholar
Glas, C.A.W., & Verhelst, N.D. (1989). Extensions of the partial credit model. Psychometrika, 54, 635659.CrossRefGoogle Scholar
Glas, C.A.W., & Verhelst, N.D. (1995). Tests of fit for polytomous Rasch models. In Fischer, G.H., & Molenaar, I.W. (Eds.), Rasch models: Foundations, recent developments and applications (pp. 325352). New York, NY: Springer.CrossRefGoogle Scholar
Glas, C.A.W., Wainer, H., & Bradlow, E.T. (2000). MML and EAP estimates for the testlet response model. In van der Linden, W.J., & Glas, C.A.W. (Eds.), Computer adaptive testing: Theory and practice (pp. 271287). Boston MA: Kluwer-Nijhoff Publishing.CrossRefGoogle Scholar
Hoijtink, H., & Molenaar, I.W. (1997). A multidimensional item response model: Constrained latent class analysis using the Gibbs sampler and posterior predictive checks. Psychometrika, 62, 171189.CrossRefGoogle Scholar
Holland, P.W., & Rosenbaum, P.R. (1986). Conditional association and uni-dimensionality in monotone latent variable models. Annals of Statistics, 14, 15231543.CrossRefGoogle Scholar
Junker, B. (1991). Essential independence and likelihood-based ability estimation for polytomous items. Psychometrika, 56, 255278.CrossRefGoogle Scholar
Kelderman, H. (1984). Loglinear RM tests. Psychometrika, 49, 223245.CrossRefGoogle Scholar
Kelderman, H. (1989). Item bias detection using loglinear IRT. Psychometrika, 54, 681697.CrossRefGoogle Scholar
Lawley, D.N. (1943). On problems connected with item selection and test construction. Proceedings of the Royal Society of Edinburgh, 61, 273287.Google Scholar
Lawley, D.N. (1944). The factorial analysis of multiple test items. Proceedings of the Royal Society of Edinburgh, Series A, 62, 7482.Google Scholar
Lord, F.M. (1952). A theory of test scores. Psychometric Monograph No. 7.Google Scholar
Lord, F.M. (1953). An application of confidence intervals and of maximum likelihood to the estimation of an examinee's ability. Psychometrika, 18, 5775.CrossRefGoogle Scholar
Lord, F.M. (1953). The relation of test score to the trait underlying the test. Educational and Psychological Measurement, 13, 517548.CrossRefGoogle Scholar
Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.Google Scholar
Lord, F.M., & Wingersky, M.S. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings”. Applied Psychological Measurement, 8, 453461.CrossRefGoogle Scholar
Martin-Löf, P. (1973). Statistika Modeller [Statistical models], Stockholm: Institutet för Försäkringsmatematik och Matematisk Statistik vid Stockholms Universitet.Google Scholar
Martin Löf, P. (1974). The notion of redundancy and its use as a quantitative measure if the discrepancy between a statistical hypothesis and a set of observational data. Scandinavian Journal of Statistics, 1, 318.Google Scholar
McDonald, R.P. (1967). Nonlinear factor analysis. Psychometric Monograph No. 15.CrossRefGoogle ScholarPubMed
McDonald, R.P. (1982). Linear versus nonlinear models in item response theory. Applied Psychological Measurement, 6, 379396.CrossRefGoogle Scholar
McDonald, R.P. (1997). Normal-ogive multidimensional model. In van der Linden, W.J., & Hambleton, R.K. (Eds.), Handbook of modern item response theory (pp. 257269). New York, NY: Springer.CrossRefGoogle Scholar
Mellenbergh, G.J. (1994). Generalized linear item response theory. Psychological Bulletin, 115, 300307.CrossRefGoogle Scholar
Meng, X.L., & Schilling, S.G. (2001). Fitting full-information item factor models and an empirical investigation of bridge sampling. Journal of the American Statistical Association, 91, 12541267.CrossRefGoogle Scholar
Mislevy, R.J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177195.CrossRefGoogle Scholar
Mislevy, R.J., & Bock, R.D. (1990). PC-BILOG. Item analysis and test scoring with binary logistic models. Chicago, IL: Scientific Software International.Google Scholar
Mislevy, R.J., & Wu, P.K. (2001). Missing responses and IRT ability estimation: Omits, choice, time limits and adaptive testing. Princeton, NJ: Educational Testing Service.Google Scholar
Molenaar, I.W. (1995). Estimation of item parameters. In Fischer, G.H., & Molenaar, I.W. (Eds.), Rasch models: Foundations, recent developments and applications (pp. 3951). New York, NY: Springer.CrossRefGoogle Scholar
Patz, R.J., & Junker, B.W. (1999). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24, 146178.CrossRefGoogle Scholar
Patz, R.J., & Junker, B.W. (1999). Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses. Journal of Educational and Behavioral Statistics, 24, 342366.CrossRefGoogle Scholar
Reckase, M.D. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9, 401412.CrossRefGoogle Scholar
Reckase, M.D. (1997). A linear logistic multidimensional model for dichotomous item response data. In van der Linden, W.J., & Hambleton, R.K. (Eds.), Handbook of modern item response theory (pp. 271286). New York, NY: Springer.CrossRefGoogle Scholar
Rasch, G. (1977). On specific objectivity: An attempt at formalizing the request for generality and validity of scientific statements. In Blegvad, M. (Eds.), The Danish yearbook of philosophy (pp. 5894). Copenhagen: Munksgaard.Google Scholar
Reiser, M. (2001). Analysis of residuals for the multinomial item response model. Psychometrika, 61, 509528.CrossRefGoogle Scholar
Rosenbaum, P.R. (1984). Testing the conditional independence and monotonicity assumptions of item response theory. Psychometrika, 49, 425436.CrossRefGoogle Scholar
Rubin, D.B. (1976). Inference and missing data. Biometrika, 63, 581592.CrossRefGoogle Scholar
Shi, J.Q., & Lee, S.Y. (1998). Bayesian sampling based approach for factor analysis models with continuous and polytomous data. British Journal of Mathematical and Statistical Psychology, 51, 233252.CrossRefGoogle Scholar
Sijtsma, K. (1998). Methodology review: Nonparametric IRT approaches to the analysis of dichotomous item scores. Applied Psychological Measurement, 22, 332.CrossRefGoogle Scholar
Stout, W.F. (1987). A nonparametric approach for assessing latent trait dimensionality. Psychometrika, 52, 589617.CrossRefGoogle Scholar
Stout, W.F. (1990). A new item response theory modeling approach with applications to unidimensional assessment and ability estimation. Psychometrika, 55, 293326.CrossRefGoogle Scholar
Thurstone, L.L. (1947). Multiple factor analysis. Chicago, IL: University of Chicago Press.Google Scholar
Wainer, H., Bradlow, E.T., & Du, Z. (2000). Testlet response theory: An analog for the 3pl model useful in testlet-based adaptive testing. In van der Linden, W.J., & Glas, C.A.W. (Eds.), Computerized adaptive testing: Theory and practice (pp. 245269). Boston, MA: Kluwer Academic Publishers.CrossRefGoogle Scholar
Wilson, D.T., Wood, R., & Gibbons, R. (1991). TESTFACT: Test scoring, item statistics, and item factor analysis [Computer program], Chicago, IL: Scientific Software International.Google Scholar
Yen, W.M. (1981). Using simultaneous results to choose a latent trait model. Applied Psychological Measurement, 5, 245262.CrossRefGoogle Scholar
Yen, W.M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125145.CrossRefGoogle Scholar
Zimowski, M.F., Muraki, E., Mislevy, R.J., & Bock, R.D. (2001). Bilog MG: Multiple-group IRT analysis and test maintenance for binary items. Chicago, IL: Scientific Software International.Google Scholar