Hostname: page-component-78c5997874-8bhkd Total loading time: 0 Render date: 2024-11-12T22:00:05.664Z Has data issue: false hasContentIssue false

Fitting Rasch Model using Appropriateness Measure Statistics

Published online by Cambridge University Press:  10 April 2014

José Antonio López Pina*
Affiliation:
University of Murcia
M. Dolores Hidalgo Montesinos
Affiliation:
University of Murcia
*
Correspondence should be addressed to: José A. López Pina, Depto. de Psicología Básica y Metodología, Facultad de Psicología, Campus de Espinardo, 30100-Murcia(Spain). Phone: 968-363478. Fax: 968-364115. E-mail: [email protected]

Abstract

In this paper, the distributional properties and power rates of the Lz, Eci2z, and Eci4z statistics when they are used as item fit statistics were explored. The results were compared to t-transformation of Outfit and Infit mean square. Four sample sizes were selected: 100, 250, 500, and 1000 examinees. The abilities were uniform and normal with mean 0 and standard deviation 1, and uniform and normal with mean –1 and standard deviation 1. The pseudo-guessing parameter was fixed at .25. Two ranges of difficulty parameters were selected: ±1 logits and ±2 logits. Two test lengths were selected: 15 and 30 items. The results showed important differences between the T-infit, T-outfit, Lz, Eci2z, and Eci4z statistics. The T-oufit, T-infit, and Lz statistics showed poor standardization with estimated parameters because their distributional properties were not close to the expected values. However, the Eci2z and Eci4z statistics showed satisfactory standardization on all conditions. Further, the power rates of Eci2z and Eci4z were 5% to 10% higher than the power rates of Lz, T-outfit, and T-infit to detect items that do not fit Rasch model.

El objetivo de este trabajo fue estudiar la potencia y propiedades distribucionales de tres estadísticos de medida de la adecuación cuando se utilizan como estadísticos de ajuste de los ítems. Los estadísticos sometidos a comparación fueron: Lz, Eci2z y Eci4z. Los resultados obtenidos se compararon con los estadísticos T-outfit y T-infit. Se seleccionaron cuatro tamaños muestrales: 100, 250, 500 y 1000 sujetos. Se sometieron a estudio distintas distribuciones de habilidad: uniforme y normal, con media 0 y desviación típica 1, y uniforme y normal con media –1 y desviación típica 1. El parámetro de pseudo-azar fue fijado en .25. Para los parámetros de dificultad se utilizaron dos distribuciones uniformes de ±1 logits y ±2 logits. Por ultimo, se consideraron dos longitudes de tests: 15 y 30 ítems. Los resultados mostraron que los estadísticos Lz, T-outfit y T-infit no tienden a los valores esperados cuando se calculan con parámetros estimados, mientras que los estadísticos Eci2z y Eci4z mantuvieron mejor las propiedades de sus distribuciones teóricas. Además, la potencia de estos dos últimos estadísticos para detectar ítems no ajustados al modelo de Rasch estuvo entre un 5% y un 10% más que la potencia de los estadísticos Lz, T-outfit y T-infit.

Type
Articles
Copyright
Copyright © Cambridge University Press 2005

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Birenbaum, M. (1986). Effect of dissimulation motivation and anxiety on response pattern appropriateness measures. Applied Psychological Measurement, 10, 167174.CrossRefGoogle Scholar
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee's ability. In Lord, F. M. & Novick, M. R. (Eds.), Statistical theories of mental test scores (pp. 397472). Reading, MA: Addison-Wesley.Google Scholar
Bond, T. G., & Fox, C. M. (2001). Applying the Rasch model: Fundamental measurement in the human sciences. Mahwah, NJ: Erlbaum.CrossRefGoogle Scholar
Drasgow, F., & Levine, M. V. (1986). Optimal detection of certain forms of inappropriate test scores. Applied Psychological Measurement, 10, 5967.CrossRefGoogle Scholar
Drasgow, F., Levine, M. V., & Mclaughlin, M. E. (1987). Detecting inappropriate test scores with optimal and practical appropriateness indices. Applied Psychological Measurement, 11, 5979.CrossRefGoogle Scholar
Fisher, G. H., & Molenaar, I. W. (Eds.) (1995). Rasch models: Foundations, recent developments, and applications. New York: Springer-Verlag.CrossRefGoogle Scholar
Gulliksen, H. (1950). Theory of mental test. New York: Wiley.CrossRefGoogle Scholar
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer-Nijhoff.CrossRefGoogle Scholar
Harnish, D. L., & Tatsuoka, K. K. (1983). A comparison of appropriateness indices based on item response theory. In Hambleton, R. K. (Ed.), Applications of item response theory (pp. 104122). Vancouver, Canada: Educational Research Institute of British Columbia.Google Scholar
Hulin, Ch. L., Drasgow, F., & Parsons, Ch. K. (1983). Item response theory: Application to psychological measurement. Homewood, IL: Dow-Jones Irwin.Google Scholar
Levine, M. V., & Rubin, D. F. (1979). Measuring the appropriateness of multiple-choice test scores. Journal of Educational Statistics, 4, 269290.CrossRefGoogle Scholar
Li, M. F., & Olejnik, S. (1997). The power of Rasch person-fit statistics in detecting unusual response patterns. Applied Psychological Measurement, 21, 215231.CrossRefGoogle Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.Google Scholar
Lord, F. M. (1983). Small N justifies Rasch model. In Weiss, D. J. (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 5161). New York: Academic Press.Google Scholar
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading MA: Addison-Wesley.Google Scholar
Meijer, R. R. (1996). The influence of the presence of deviant item score patterns on the power of a person-fit statistic. Applied Psychological Measurement, 20, 141154.CrossRefGoogle Scholar
Meijer, R. R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25, 107135.CrossRefGoogle Scholar
Molenaar, I. W., & Hoijtink, H. (1990). The many null distributions of person-fit indices. Psychometrika, 55, 75106.CrossRefGoogle Scholar
Molenaar, I. W., & Hoijtink, H. (1996). Person-fit and the Rasch model, with an application to knowledge of logical quantors. Applied Measurement in Education, 9, 2745.CrossRefGoogle Scholar
Nering, M. L. (1995). The distribution of person fit using true and estimated person parameters. Applied Psychological Measurement, 19, 121129.CrossRefGoogle Scholar
Nering, M. L. (1997). The distribution of indexes of person-fit within the computerized adaptive testing environment. Applied Psychological Measurement, 21, 115127.CrossRefGoogle Scholar
Noonan, B. W., Boss, M. W., & Gessaroli, M. E. (1992). The effect of test length and IRT model on the distribution and stability of three appropriateness indexes. Applied Psychological Measurement, 16, 345352.CrossRefGoogle Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainment test. Copenhagen: The Danish Institute of Educational Research. (Expanded edition, 1980. Chicago: The University Chicago Press.)Google Scholar
Reise, S. P. (1990). A comparison of item-and person-fit methods of assessing model-data fit in IRT. Applied Psychological Measurement, 14, 127137.CrossRefGoogle Scholar
Reise, S. P. (1995). Scoring method and the detection of person misfit in a personality assessment context. Applied Psychological Measurement, 19, 213229.CrossRefGoogle Scholar
Rogers, H. J., & Hattie, J. A. (1987). A Monte Carlo investigation of several person and item fit statistics for item response models. Applied Psychological Measurement, 11, 4757.CrossRefGoogle Scholar
Smith, R. M. (1991). The distributional properties of Rasch item-fit statistics. Educational and Psychological Measurement, 51, 541565.CrossRefGoogle Scholar
Smith, R. M., Schumacker, R. E., & Bush, M. J. (1998). Using item mean squares to evaluate fit to the Rasch model. Journal of Outcome Measurement, 2, 6678.Google Scholar
Stout, W. (1987). A nonparametric approach for assessing latent trait unidimensionality. Psychometrika, 52, 589617.CrossRefGoogle Scholar
SYSTAT (v. 10.0) (2000). The system for statistics. SPSS Inc.Google Scholar
Tatsuoka, K. K. (1984). Caution indices based on item response theory. Psychometrika, 49, 95110.CrossRefGoogle Scholar
Tatsuoka, K. K., & Linn, R. L. (1983). Indices for detecting unusual response patterns: Links between two general approaches and potential applications. Applied Psychological Measurement, 7, 8196.CrossRefGoogle Scholar
Van der Linden, W. J., & Hambleton, R. K. (Eds.). (1997). Handbook of modern item response theory. New York: Springer-Verlag.CrossRefGoogle Scholar
Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: MESA Press.Google Scholar
Wright, B. D., & Stone, M. (1979). Best test design. Chicago: MESA Press.Google Scholar
Wu, M. L., Adams, R. J., & Wilson, M. R. (1998). Acer ConQuest: Generalised item response modelling software. Melbourne, Australia: Australian Council for Educational Research.Google Scholar