Hostname: page-component-745bb68f8f-cphqk Total loading time: 0 Render date: 2025-01-07T17:36:38.908Z Has data issue: false hasContentIssue false

A Person Fit Test for IRT Models for Polytomous Items

Published online by Cambridge University Press:  01 January 2025

C. A. W. Glas*
Affiliation:
University of Twente, The Netherlands
Anna Villa T. Dagohoy
Affiliation:
University of Twente, The Netherlands
*
Requests for reprints should be sent to Cees A.W. Glas, Department of Research Methodology, Measurement and Data Analysis, University of Twente, P.O. Box 217, 7500AE, Enschede, The Netherlands. E-mail: [email protected]

Abstract

A person fit test based on the Lagrange multiplier test is presented for three item response theory models for polytomous items: the generalized partial credit model, the sequential model, and the graded response model. The test can also be used in the framework of multidimensional ability parameters. It is shown that the Lagrange multiplier statistic can take both the effects of estimation of the item parameters and the estimation of the person parameters into account. The Lagrange multiplier statistic has an asymptotic χ2-distribution. The Type I error rate and power are investigated using simulation studies. Results show that test statistics that ignore the effects of estimation of the persons’ ability parameters have decreased Type I error rates and power. Incorporating a correction to account for the effects of the estimation of the persons’ ability parameters results in acceptable Type I error rates and power characteristics; incorporating a correction for the estimation of the item parameters has very little additional effect. It is investigated to what extent the three models give comparable results, both in the simulation studies and in an example using data from the NEO Personality Inventory-Revised.

Type
Original Paper
Copyright
Copyright © 2007 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aitchison, J., Silvey, S.D. (1958). Maximum likelihood estimation of parameters subject to restraints. Annals of Mathematical Statistics, 29, 813828.CrossRefGoogle Scholar
Andersen, E.B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38, 123140.CrossRefGoogle Scholar
Bock, R.D., Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: An application of an EM-algorithm. Psychometrika, 46, 443459.CrossRefGoogle Scholar
Bock, R.D., Gibbons, R.D., Muraki, E. (1988). Full-information factor analysis. Applied Psychological Measurement, 12, 261280.CrossRefGoogle Scholar
Costa, P.T. Jr., McCrae, R.R. (1992). Normal personality assessment in clinical practice: The NEO personality inventory. Psychological Assessment, 4, 513.CrossRefGoogle Scholar
Drasgow, F., Levine, M.V., McLaughlin, M.E. (1991). Appropriateness measurement for some multidimensional test batteries. Applied Psychological Measurement, 15, 171191.CrossRefGoogle Scholar
Drasgow, F., Levine, M.V., Williams, E.A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38, 6786.CrossRefGoogle Scholar
Gibbons, R.D., Hedeker, D.R. (1992). Full-information bi-factor analysis. Psychometrika, 57, 423436.CrossRefGoogle Scholar
Glas, C.A.W. (1988). The derivation of some tests for the Rasch model from the multinomial distribution. Psychometrika, 53, 525546.CrossRefGoogle Scholar
Glas, C.A.W. (1999). Modification indices for the 2-pl and the nominal response model. Psychometrika, 64, 273294.CrossRefGoogle Scholar
Glas, C.A.W., Suárez Falcón, J.C. (2003). A comparison of item-fit statistics for the three-parameter logistic model. Applied Psychological Measurement, 27, 87106.CrossRefGoogle Scholar
Glas, C.A.W., Wainer, H., & Bradlow, . (2000). MML and EAP estimates for the testlet response model. In van der Linden, W.J. & Glas, C.A.W. (Eds.), Computer adaptive testing: Theory and practice (pp. 271287). Boston: Kluwer-Nijhoff.CrossRefGoogle Scholar
Jansen, M.G.H., Glas, C.A.W. (2005). Checking the assumptions of Rasch’s model for speed tests. Psychometrika, 70, 671684.CrossRefGoogle Scholar
Klauer, K.C. (1995). The assessment of person fit. In Fischer, G.H. & Molenaar, I.W. (Eds.), Rasch models, foundations, recent developments, and applications (pp. 97110). New York: Springer-Verlag.Google Scholar
Klauer, K.C., Rettig, K. (1990). An approximately standardized person test for assessing consistency with a latent trait model. British Journal of Mathematical and Statistical Psychology, 43, 193206.CrossRefGoogle Scholar
Levine, M.V., Rubin, D.B. (1979). Measuring the appropriateness of multiple-choice test scores. Journal of Educational Statistics, 4, 269290.CrossRefGoogle Scholar
Masters, G.N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149174.CrossRefGoogle Scholar
McDonald, R.P. (1997). Normal-ogive multidimensional model. In van der Linden, W.J. & Hambleton, R.K. (Eds.), Handbook of modern item response theory (pp. 257269). New York: Springer-Verlag.CrossRefGoogle Scholar
Meijer, R.R., Sijtsma, K. (1995). Detection of aberrant item score patterns: A review and new developments. Applied Measurement in Education, 8, 261272.CrossRefGoogle Scholar
Meijer, R.R., Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25, 107135.CrossRefGoogle Scholar
Mellenbergh, G.J. (1995). Conceptual notes on models for discrete polytomous item responses. Applied Psychological Measurement, 19, 91100.CrossRefGoogle Scholar
Mislevy, R.J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177195.CrossRefGoogle Scholar
Molenaar, I.W. (1983). Some improved diagnostics for failure in the Rasch model. Psychometrika, 48, 4972.CrossRefGoogle Scholar
Molenaar, I.W., Hoijtink, H. (1990). The many null distributions of person fit indices. Psychometrika, 55, 75106.CrossRefGoogle Scholar
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159176.CrossRefGoogle Scholar
Nering, M.L. (1995). The distribution of person fit statistics using true and estimated person parameters. Applied Psychological Measurement, 19, 121129.CrossRefGoogle Scholar
Orlando, M., Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24, 5064.CrossRefGoogle Scholar
Rao, C.R. (1947). Large sample tests of statistical hypothesis concerning several parameters with applications to problems of estimation. Proceedings of the Cambridge Philosophical Society, 44, 5057.Google Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.Google Scholar
Reckase, M.D. (1997). A linear logistic multidimensional model for dichotomous item response data. In van der Linden, W.J. & Hambleton, R.K. (Eds.), Handbook of modern item response theory (pp. 271286). New York: Springer-Verlag.CrossRefGoogle Scholar
Reise, S.P. (1995). Scoring method and the detection of person misfit in a personality assessment context. Applied Psychological Measurement, 19, 213229.CrossRefGoogle Scholar
Samejima, F. (1969). Estimation of latent ability using a pattern of graded scores. Psychometrika Monograph Supplement, No. 17. Greensboro, NC: Psychometric Society.Google Scholar
Samejima, F. (1973). Homogeneous case of the continuous response model. Psychometrika, 38, 203219.CrossRefGoogle Scholar
Sijtsma, K., Meijer, R.R. (2001). The person response function as a tool in person-fit research. Psychometrika, 66, 191207.CrossRefGoogle Scholar
Smith, R.M. (1985). A comparison of Rasch person analysis and robust estimators. Educational and Psychological Measurement, 45, 433444.CrossRefGoogle Scholar
Smith, R.M. (1986). Person fit in the Rasch model. Educational and Psychological Measurement, 46, 359372.CrossRefGoogle Scholar
Snijders, T. (2001). Asymptotic distribution of person-fit statistics with estimated person parameter. Psychometrika, 66, 331342.CrossRefGoogle Scholar
Sörbom, D. (1989). Model modification. Psychometrika, 54, 371384.CrossRefGoogle Scholar
Tatsuoka, K.K. (1984). Caution indices based on item response theory. Psychometrika, 49, 95110.CrossRefGoogle Scholar
Thissen, D., Chen, W.-H., & Bock, R.D. (2003). Multilog. Lincolnwood, IL: Scientific Software International.Google Scholar
Tsutakawa, R.K., Johnson, J.C. (1990). The effect of uncertainty of item parameter-estimation on ability estimates. Psychometrika, 55, 371390.CrossRefGoogle Scholar
Tutz, G. (1990). Sequential item response models with an ordered response. British Journal of Mathematical and Statistical Psychology, 43, 3955.CrossRefGoogle Scholar
van Krimpen-Stoop, E.M.L.A., Meijer, R.R. (2002). Detection of person misfit in computerized adaptive tests with polytomous items. Applied Psychological Measurement, 26, 164180.CrossRefGoogle Scholar
Verhelst, N.D., Glas, C.A.W., & de Vries, H.H. (1997). A steps model to analyze partial credit. In van der Linden, W.J. & Hambleton, R.K. (Eds.), Handbook of modern item response theory (pp. 123138). New York: Springer-Verlag.Google Scholar
von Davier, M., Molenaar, I.W. (2003). A person-fit index for polytomous Rasch models, latent class models, and their mixture generalizations. Psychometrika, 68, 213228.CrossRefGoogle Scholar
Wright, B.D., & Linacre, J.M. (1992). Bigsteps (Computer software). Chicago: MESA Press.Google Scholar
Wright, B.D., & Masters, G.N. (1982). Rating scale analysis. (Computer software). Chicago: Mesa Press.Google Scholar
Wright, B.D., & Stone, M.H. (1979). Best test design. Chicago: MESA Press University of Chicago.Google Scholar
Yen, W.M. (1981). Using simultaneous results to choose a latent trait model. Applied Psychological Measurement, 5, 245262.CrossRefGoogle Scholar
Yen, W.M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125145.CrossRefGoogle Scholar