Hostname: page-component-745bb68f8f-g4j75 Total loading time: 0 Render date: 2025-01-08T11:55:37.309Z Has data issue: false hasContentIssue false

A Procedure for Dimensionality Analyses of Response Data from Various Test Designs

Published online by Cambridge University Press:  01 January 2025

Jinming Zhang*
Affiliation:
University of Illinois at Urbana-Champaign
*
Requests for reprints should be sent to Jinming Zhang, Department of Educational Psychology, University of Illinois at Urbana-Champaign, 1310 South Sixth Street, 236A Education Building, Champaign, IL 61820, USA. E-mail: [email protected]

Abstract

In some popular test designs (including computerized adaptive testing and multistage testing), many item pairs are not administered to any test takers, which may result in some complications during dimensionality analyses. In this paper, a modified DETECT index is proposed in order to perform dimensionality analyses for response data from such designs. It is proven in this paper that under certain conditions, the modified DETECT can successfully find the dimensionality-based partition of items. Furthermore, the modified DETECT index is decomposed into two parts, which can serve as indices of the reliability of results from the DETECT procedure when response data are judged to be multidimensional. A simulation study shows that the modified DETECT can successfully recover the dimensional structure of response data under reasonable specifications. Finally, the modified DETECT procedure is applied to real response data from two-stage tests to demonstrate how to utilize these indices and interpret their values in dimensionality analyses.

Type
Original Paper
Copyright
Copyright © The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Allen, N., Donoghue, J.R., & Schoeps, T.L. (2001). The NAEP 1998 technical report (NCES 2001-509). Washington, DC: Office of Educational Research and Improvement, US Department of Education. Google Scholar
Angoff, W.H. (1968). How we calibrate college board scores. College Board Review, 68, 1114Google Scholar
Bock, R.D., & Zimowski, M.F. (2003). Feasibility studies of two-stage testing in large-scale educational assessment: implications for NAEP. NAEP Validity Studies (NVS). Washington, DC: Office of Educational Research and Improvement, US Department of Education. Google Scholar
Budescu, D. (1985). Efficiency of linear equating as a function of the length of the anchor test. Journal of Educational Measurement, 22(1), 1320CrossRefGoogle Scholar
Camilli, G., Wang, M., & Fesq, J. (1995). The effects of dimensionality on equating the law school admission test. Journal of Educational Measurement, 32, 7996CrossRefGoogle Scholar
Dorans, N.J., Kubiak, A., & Melican, G.J. (1998). Guidelines for selection of embedded common items for score equating, Princeton: ETS (ETS SR-98-02)Google Scholar
Grigg, W.S., Daane, M.C., Jin, Y., & Campbell, J.R. (2003). The nation’s report card: reading 2002 (NCES 2003-521). Washington, DC: National Center for Educational Statistics. Google Scholar
Hattie, J. (1984). An empirical study of various indices for determining unidimensionality. Multivariate Behavioral Research, 19, 4978CrossRefGoogle ScholarPubMed
Hattie, J. (1985). Methodological review: assessing unidimensionality of tests and items. Applied Psychological Measurement, 9, 139164CrossRefGoogle Scholar
Hays, W.L. (1973). Statistics for the social sciences, San Francisco: Holt, Rinehart & WinstonGoogle Scholar
Hetter, R., & Sympson, B. (1997). Item exposure control in CAT-ASVAB. In Sands, W., Waters, B., & McBride, J. (Eds.), Computerized adaptive testing: from inquiry to operation, Washington, DC: American Psychological Association 141144CrossRefGoogle Scholar
Hulin, C.L., Drasgow, F., & Parsons, C.K. (1983). Item response theory: application to psychological measurement, Homewood: Dow Jones-IrwinGoogle Scholar
Kim, H.R. (1994). New techniques for the dimensionality assessment of standardized test data. Unpublished doctoral dissertation, Department of Statistics, University of Illinois at Urbana—Champaign. Google Scholar
Kolen, M.J., & Brennan, R.L. (2004). Test equating, scaling, and linking: methods and practices, (2nd ed.). New York: SpringerCrossRefGoogle Scholar
Lord, F.M. (1971). A theoretical study of two-stage testing. Psychometrika, 36, 227242CrossRefGoogle Scholar
Lord, F.M. (1980). Applications of item response theory to practical testing problems, Hillsdale: Lawrence Erlbaum AssociatesGoogle Scholar
McDonald, R.P. (1981). The dimensionality of tests and items. British Journal of Mathematical and Statistical Psychology, 34, 100117CrossRefGoogle Scholar
McDonald, R.P. (1982). Linear versus nonlinear models in item response theory. Applied Psychological Measurement, 6(4), 379396CrossRefGoogle Scholar
McDonald, R.P. (1994). Testing for approximate dimensionality. In Laveault, D., Zumbo, B., Gessaroli, M., & Boss, M. (Eds.), Modern theories of measurement: problems and issues, Ottawa: University of Ottawa Press 6385Google Scholar
McDonald, R.P. (1997). Normal-ogive multidimensional model. In van der Linden, W.J., & Hambleton, R.K. (Eds.), Handbook of modern item response theory, New York: Springer 258269Google Scholar
Mislevy, R. (1986). Recent developments in the factor analysis of categorical variables. Journal of Educational Statistics, 11, 331CrossRefGoogle Scholar
Mislevy, R., & Bock, R.D. (1982). BILOG: item analysis and test scoring with binary logistic models [Computer software]. Mooresville: Scientific Software. Google Scholar
Muraki, E., & Bock, R.D. (1997). PARSCALE: IRT item analysis and test scoring for rating scale data [Computer software]. Chicago: Scientific Software. Google Scholar
National Assessment Governing Board (2005). Reading framework for the 2005 National Assessment of Educational Progress, Washington, DC: National Assessment Governing BoardGoogle Scholar
Oltman, P.K., Stricker, L.J., & Barrows, T.S. (1990). Analyzing test structure by multidimensional scaling. Journal of Applied Psychology, 75, 2127CrossRefGoogle Scholar
Reckase, M.D. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9, 401412CrossRefGoogle Scholar
Reckase, M.D., & McKinley, R.L. (1991). The discriminating power of items that measure more than one dimension. Applied Psychological Measurement, 15, 361373CrossRefGoogle Scholar
Roussos, L.A., & Ozbek, O. (2006). Formulation of the DETECT population parameter and evaluation of DETECT estimator bias. Journal of Educational Measurement, 43, 215243CrossRefGoogle Scholar
Roussos, L.A., Stout, W.F., & Marden, J. (1998). Using new proximity measures with hierarchical cluster analysis to detect multidimensionality. Journal of Educational Measurement, 35, 130CrossRefGoogle Scholar
Sinharay, S., & Holland, P.W. (2007). Is it necessary to make anchor tests mini-versions of the tests being equated or can some restrictions be relaxed?. Journal of Educational Measurement, 44, 249275CrossRefGoogle Scholar
Stout, W.F. (1987). A nonparametric approach for assessing latent trait dimensionality. Psychometrika, 52, 589617CrossRefGoogle Scholar
Stout, W.F. (1990). A new item response theory modeling approach with applications to unidimensional assessment and ability estimation. Psychometrika, 55, 293326CrossRefGoogle Scholar
Stout, W.F., Habing, B., Douglas, J., Kim, H.R., Roussos, L.A., & Zhang, J. (1996). Conditional covariance based nonparametric multidimensionality assessment. Applied Psychological Measurement, 20, 331354CrossRefGoogle Scholar
Sympson, J.B., & Hetter, R.D. (1985). Controlling item-exposure rates in computerized adaptive testing. Proceedings of the 27th annual meeting of the military testing association, San Diego: Navy Personnel Research and Development Center 973977Google Scholar
UNESCO-UIS, (2008). Literacy assessment and monitoring programme (LAMP): framework for the assessment of reading component skills, Montreal: UNESCO Institute for Statistics (UIS)Google Scholar
UNESCO-UIS, (2009). The next generation of literacy statistics: implementing the literacy assessment and monitoring programme (LAMP), Montreal: UNESCO Institute for Statistics (UIS)Google Scholar
Van Abswoude, A.A.H., Van der Ark, L.A., & Sijtsma, K. (2004). A comparative study on test dimensionality assessment procedures under nonparametric IRT models. Applied Psychological Measurement, 28, 324CrossRefGoogle Scholar
Zhang, J. (2007). Conditional covariance theory and DETECT for polytomous items. Psychometrika, 72, 6991CrossRefGoogle Scholar
Zhang, J., & Stout, W.F. (1999). The theoretical DETECT index of dimensionality and its application to approximate simple structure. Psychometrika, 64, 213249CrossRefGoogle Scholar