Hostname: page-component-745bb68f8f-grxwn Total loading time: 0 Render date: 2025-01-08T03:14:34.913Z Has data issue: false hasContentIssue false

Variable-Length Stopping Rules for Multidimensional Computerized Adaptive Testing

Published online by Cambridge University Press:  01 January 2025

Chun Wang*
Affiliation:
University of Washington
David J. Weiss
Affiliation:
University of Minnesota
Zhuoran Shang
Affiliation:
University of Minnesota
*
Correspondence should be made to Chun Wang, Measurement and Statistics, College of Education, University of Washington, 312E Miller Hall, Box 353600, Seattle, WA 98195-3600, USA. Email: [email protected]

Abstract

In computerized adaptive testing (CAT), a variable-length stopping rule refers to ending item administration after a pre-specified measurement precision standard has been satisfied. The goal is to provide equal measurement precision for all examinees regardless of their true latent trait level. Several stopping rules have been proposed in unidimensional CAT, such as the minimum information rule or the maximum standard error rule. These rules have also been extended to multidimensional CAT and cognitive diagnostic CAT, and they all share the same idea of monitoring measurement error. Recently, Babcock and Weiss (J Comput Adapt Test 2012. https://doi.org/10.7333/1212-0101001) proposed an “absolute change in theta” (CT) rule, which is useful when an item bank is exhaustive of good items for one or more ranges of the trait continuum. Choi, Grady and Dodd (Educ Psychol Meas 70:1–17, 2010) also argued that a CAT should stop when the standard error does not change, implying that the item bank is likely exhausted. Although these stopping rules have been evaluated and compared in different simulation studies, the relationships among the various rules remain unclear, and therefore there lacks a clear guideline regarding when to use which rule. This paper presents analytic results to show the connections among various stopping rules within both unidimensional and multidimensional CAT. In particular, it is argued that the CT-rule alone can be unstable and it can end the test prematurely. However, the CT-rule can be a useful secondary rule to monitor the point of diminished returns. To further provide empirical evidence, three simulation studies are reported using both the 2PL model and the multidimensional graded response model.

Type
Original Paper
Copyright
Copyright © 2018 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11336-018-9644-7) contains supplementary material, which is available to authorized users.

The R code and the real MGRM item parameters used in this paper are available online.

References

Anderson, T. W. (1984). An introduction to multivariate statistical analysis, 2 New York: Wiley.Google Scholar
Babcock, B., & Weiss, D. (2012). Termination criteria in computerized adaptive tests: Do variable-length CATs provide efficient and effective measurement? Journal of Computerized Adaptive Testing. https://doi.org/10.7333/1212-0101001CrossRefGoogle Scholar
Boyd, A. M.Dodd, B. G., & Choi, S. W. (2010). Polytomous models in computerized adaptive testing. In Nering, M. L., & Ostini, R. Handbook of polytomous item response theory models, 229255. New York NY: Routledge.Google Scholar
Cai, L. (2015). flexMIRT version 3: Flexible multilevel multidimensional item analysis and test scoring [Computer software], Chapel Hill, NC: Vector Psychometric Group.Google Scholar
Chang, H. H., &Ying, Z. L. To weight or not to weight? Balancing influence of initial items in adaptive testing. Psychometrika, (2008). 73 (3), 441450.CrossRefGoogle Scholar
Cheng, Y.Guo, F.Chang, H., & Douglas, J. (2009). Constraint weighted a-stratification for computerized adaptive testing with nonstatistical constraints: Balancing measurement efficiency and exposure control. Educational and Psychological Measurement, 69, 3549.CrossRefGoogle Scholar
Choi, S. W.Grady, M. W., & Dodd, B. G. (2010). A new stopping rule for computerized adaptive testing. Educational and Psychological Measurement, 70, 117.Google ScholarPubMed
Daniel, M. H. (1999). Behind the scenes: Using new measurement methods on DAS and KAITEmbretson, S. E., & Hershberger, S. L. The new rules of measurement, Mahwah, NJ: Lawrence Erlbaum Associates. 3763.Google Scholar
Dodd, B. G.Koch, W. R., & De Ayala, R. J. (1989). Operational characteristics of adaptive testing procedures using the graded response model. Applied Psychological Measurement, 13, 129143.CrossRefGoogle Scholar
Dodd, B. G.Koch, W. R., & De Ayala, R. J. (1993). Computerized adaptive testing using the partial credit model: Effects of item pool characteristics and different stopping rules. Educational and Psychological Measurement, 53, 6177.CrossRefGoogle Scholar
Fayers, P. M. (2007). Applying item response theory and computer adaptive testing: The challenges for health outcomes assessment. Quality of Life Research, 16, 187194.CrossRefGoogle ScholarPubMed
Gardner, W.Shear, K.Kelleher, K.Pajer, K.Mammen, O.Buysse, D.et.al (2004). Computerized adaptive measurement of depression: A simulation study. BMC Psychiatry, 4 (13), 111.CrossRefGoogle ScholarPubMed
Gershon, R. C. (2017).FastCAT—Customizing CAT administration rules to increase response efficiency. Paper presented at the 6th international conference on computerized adaptive testing, Niigata, Japan.Google Scholar
Gibbons, R. D.Weiss, D. J.Kupfer, D. J.Frank, E.Fagiolini, A.Grochocinski, V. J.et.al Using computerized adaptive testing to reduce the burden of mental health assessment. Psychiatric Services, (2008). 59, 4958.CrossRefGoogle ScholarPubMed
Hart, D. L.Cook, K. F.Mioduski, J. E.Teal, C. R., & Crane, P. K. (2006). Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function. Journal of Clinical Epidemiology, 59, 290298.CrossRefGoogle ScholarPubMed
Hart, D. L.Mioduski, J. E., & Stratford, P. W. (2005). Simulated computerized adaptive tests for measuring functional status were efficient with good discriminant validity in patients with hip, knee, or foot/ankle impairments. Journal of Clinical Epidemiology, 58, 629638.CrossRefGoogle ScholarPubMed
Hsieh, C-Avon Eye, A. A.Maier, K. S. (2010). Using a multivariate multilevel polytomous item response theory model to study parallel processes of change: The dynamic association between adolescents’ social isolation and engagement with delinquent peers in the National Youth Survey. Multivariate Behavioral Research, 45 (3), 508552.CrossRefGoogle ScholarPubMed
Jiang, S.Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in Psychology (Quantitative Psychology and Measurement). https://doi.org/10.3389/fpsyg.2016.00109CrossRefGoogle Scholar
Lord, F. M.Novick, M. R. (1968). Statistical theories of mental test scores, Reading, MA: Addison-Wesley.Google Scholar
Makransky, G., & Glas, C. A. W. (2013). The applicability of multidimensional computerized adaptive testing for cognitive ability measurement in organizational assessment. International Journal of Testing, 13, 123139.CrossRefGoogle Scholar
Maurelli, V., & Weiss, D. J. (1981). Factors influencing the psychometric characteristics of an adaptive testing strategy for test batteries (Research Rep. No. 81-4). Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory. Retrieved from https://eric.ed.gov/?id=ED212676.Google Scholar
Michel, P.Baumstarck, K.Ghattas, B.Pelletier, J.Loundou, A.Boucekine, M.et.al A Multidimensional Computerized Adaptive Short-Form Quality of Life Questionnaire developed and validated for multiple sclerosis. The MusiQoL-MCAT. Medicine, (2016). 95 (14), Article e3068.CrossRefGoogle ScholarPubMed
Mulder, J., & van der Linden, W. J. (2009). Multidimensional adaptive testing with optimal design criteria for item selection. Psychometrika, 74 (2), 273296.CrossRefGoogle ScholarPubMed
Nering, M. L.Ostini, R. (2010). Handbook of polytomous item response theory models, New York: Taylor and Francis.Google Scholar
Nikolaus, S.Bode, C.Taal, E.Vonkeman, H. E.Glas, C. A. W.van der Laar, M. A. F. J. (2015). Working mechanism of a multidimensional computerized adaptive test for fatigue in rheumatoid arthritis. Health Qual Life Outcomes, 13, 23.CrossRefGoogle ScholarPubMed
Samejima, F. (1969). Estimation of latent trait ability using a response pattern of graded scores. Psychometrika Monograph. No. 17.CrossRefGoogle Scholar
Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61 (2), 331354.CrossRefGoogle Scholar
Thissen, D., & Mislevy, R. J. (2000). Wainer, H. Testing algorithms. Computerized adaptive testing: A primer. 2, Hillsdale, NJ: Lawrence Erlbaum. 101133.Google Scholar
Veldkamp, B. P., & van der Linden, W. J. (2002). Multidimensional adaptive testing with constraints on test content. Psychometrika 67 (4), 575588.CrossRefGoogle Scholar
Wang, C. (2014). Improving measurement precision of hierarchical latent traits using adaptive testing. Journal of Educational and Behavioral Statistics, 39, 452477.CrossRefGoogle Scholar
Wang, C. (2015). On latent trait estimation in multidimensional compensatory item response models. Psychometrika, 80, 428449.CrossRefGoogle ScholarPubMed
Wang, C., & Chang, H. (2011). Item selection in multidimensional computerized adaptive tests: Gaining information from different angles. Psychometrika, 76, 363384.CrossRefGoogle Scholar
Wang, C.Chang, H., & Boughton, K. (2011). Kullback–Leibler information and its applications in multidimensional adaptive tests. Psychometrika, 76, 1339.CrossRefGoogle Scholar
Wang, C., & Chang, H.Boughton, K. Deriving stopping rules for multidimensional computerized adaptive testing. Applied Psychological Measurement, (2013). 37, 99122.CrossRefGoogle Scholar
Wang, C.Chang, H., & Douglas, J. (2012). Combining CAT with cognitive diagnosis: A weighted item selection approach. Behavior Research Methods, 44, 95109.CrossRefGoogle ScholarPubMed
Wang, C.Su, S., & Weiss, D. J. (2018). Robustness of parameter estimation to assumptions of normality in the multidimensional graded response model. Multivariate Behavioral Research, 53 (3), 403418.CrossRefGoogle ScholarPubMed
Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21, 361375.CrossRefGoogle Scholar
Weiss, D. J. (2011). Better data from better measurements using computerized adaptive testing. Journal of Methods and Measurement in the Social Sciences, 2, 127.CrossRefGoogle Scholar
Supplementary material: File

Wang et al. supplementary material

Wang et al. supplementary material 1
Download Wang et al. supplementary material(File)
File 9 KB
Supplementary material: File

Wang et al. supplementary material

Wang et al. supplementary material 2
Download Wang et al. supplementary material(File)
File 26.6 KB