Hostname: page-component-5f745c7db-nzk4m Total loading time: 0 Render date: 2025-01-06T23:23:14.676Z Has data issue: true hasContentIssue false

Sequential Generalized Likelihood Ratio Tests for Online Item Monitoring

Published online by Cambridge University Press:  01 January 2025

Hyeon-Ah Kang*
Affiliation:
University of Texas at Austin
*
Correspondence should be made to Hyeon-Ah Kang, University of Texas at Austin, Austin, USA. Email: [email protected]

Abstract

The study presents statistical procedures that monitor functioning of items over time. We propose generalized likelihood ratio tests that surveil multiple item parameters and implement with various sampling techniques to perform continuous or intermittent monitoring. The procedures examine stability of item parameters across time and inform compromise as soon as they identify significant parameter shift. The performance of the monitoring procedures was validated using simulated and real-assessment data. The empirical evaluation suggests that the proposed procedures perform adequately well in identifying the parameter drift. They showed satisfactory detection power and gave timely signals while regulating error rates reasonably low. The procedures also showed superior performance when compared with the existent methods. The empirical findings suggest that multivariate parametric monitoring can provide an efficient and powerful control tool for maintaining the quality of items. The procedures allow joint monitoring of multiple item parameters and achieve sufficient power using powerful likelihood-ratio tests. Based on the findings from the empirical experimentation, we suggest some practical strategies for performing online item monitoring.

Type
Theory and Methods
Copyright
Copyright © 2022 The Author(s) under exclusive licence to The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Armstrong, R. D.,Shi, M.A parametric cumulative sum statistic for person fit.Applied Psychological Measurement,(2009).33,391410.CrossRefGoogle Scholar
Ban, J. C.,Hanson, B. A.,Wang, T.,Yi, Q., &Harris, D. J.(2001).A comparative study of on-line pretest item-calibration/scaling methods in computerized adaptive testing.Journal of Educational Measurement,38(3),191212.CrossRefGoogle Scholar
Basseville, M., & Nikiforov, I. V. (1993). Detection of abrupt changes: Theory and applications. Prentice-Hall Inc.Google Scholar
Birnbaum, A. (1968). Theories of mental test scores. In F. M. Lord & M. R. Novick (Eds.), Some latent trait models and their use in inferring an examinee’s ability (pp. 397–479). MA: Addison-Wesley, Reading.Google Scholar
Bock, R.,Muraki, E., &Pfeiffenberger, W.(1988).Item pool maintenance in the presence of item parameter drift.Journal of Educational Measurement,25,275285.CrossRefGoogle Scholar
Choe, E. M.,Zhang, J., &Chang, H-.(2018).Sequential detection of compromised items using response times in computerized adaptive testing.Psychometrika,83,650673.CrossRefGoogle ScholarPubMed
Clark, A. (2013). Review of parameter drift methodology and implications for operational testing. Retrieved from https://www.ncbex.org/statistics-and-research/covington-award.Google Scholar
Cohen, J.(1992).A power primer.Psychological Bulletin,112,155159.CrossRefGoogle ScholarPubMed
Crosier, R. B.(1988).Multivariate generalizations of cumulative sum quality-control schemes.Technometrics,30,291303.CrossRefGoogle Scholar
DeMars, C. E.(2004).Detection of item parameter drift over multiple test administrations.Applied Measurement in Education,17,265300.CrossRefGoogle Scholar
Donoghue, J. R., &Isham, S. P.(1998).A comparison of procedures to detect item parameter drift.Applied Psychological Measurement,22(1),3351.CrossRefGoogle Scholar
Goldstein, H.(1983).Measuring changes in educational attainment over time: Problems and possibilities.Journal of Educational Measurement,20,369377.CrossRefGoogle Scholar
Guo, H.,Robin, F., &Dorans, N.(2017).Detecting item drift in large-scale testing.Journal of Educational Measurement,54,265284.CrossRefGoogle Scholar
Healy, J. D.(1987).A note on multivariate CUSUM procedures.Technometrics,29,409412.CrossRefGoogle Scholar
Hotelling, H.(1931).The generalization of Student’s ratio.Annals of Mathematical Statistics,2,360378.CrossRefGoogle Scholar
Huggins-Manley, A. C.(2017).Psychometric Consequences of Subpopulation Item Parameter Drift.Educational and Psychological Measurement,2017,143164.CrossRefGoogle Scholar
Kang, H.-A.,Zheng, Y., &Chang, H-.(2020).Online Calibration of a Joint Model of Item Responses and Response Times in Computerized Adaptive Testing.Journal of Educational and Behavioral Statistics,45,175208.CrossRefGoogle Scholar
Klein Entink, R. H.,Kuhn, J.-T.,Hornke, L. F., &Fox, J-.(2009).Evaluating cognitive theory: A joint modeling approach using responses and response times.Psychological Methods,14,5475.CrossRefGoogle ScholarPubMed
Lai, T.,Ghosh, B. K., &Sen, P. K.(1991).Asymptotic optimality of generalized sequential likelihood ratio tests in some classical sequential testing problems.Handbook of sequential analysis handbook of sequential analysis,New York:Marcel Dekker Inc.121144.Google Scholar
Lee, Y.-H., &Lewis, C.(2021).Monitoring item performance with CUSUM statistics in continuous testing.Journal of Educational and Behavioral Statistics,46,611648.CrossRefGoogle Scholar
Liu, C.,Han, K. T., &Li, J.(2019).Compromised item detection for computerized adaptive testing.Front. Psychol.,10,829CrossRefGoogle ScholarPubMed
Lowry, C. A.,Woodall, W. H.,Champ, C. W., &Rigdon, S. E.(1992).A multivariate EWMA control chart.Technometrics,34,4653.CrossRefGoogle Scholar
Marianti, S.,Fox, J.-P.,Avetisyan, M.,Veldkamp, B. P., &TijmstraFirs, J.(2014).Testing for aberrant behavior in response time modeling.Journal of Educational and Behavioral Statistics,39,426451.CrossRefGoogle Scholar
Page, E. S.(1954).Continuous inspection schemes.Biometrika,41,100115.CrossRefGoogle Scholar
Pignatiello, J. J., &Runger, G. C.(1990).Comparisons of multivariate CUSUM charts.Journal of Quality Technology,22,173186.CrossRefGoogle Scholar
Segall, D. O.(2002).An item response model for characterizing test compromise.Journal of Educational and Behavioral Statistics,27,163179.CrossRefGoogle Scholar
Segall, D. O.(2004).A sharing item response theory model for computerized adaptive testing.Journal of Educational and Behavioral Statistics,29,439460.CrossRefGoogle Scholar
Shu, Z.,Henson, R., &Luecht, R.(2013).Using deterministic, gated item response theory model to detect test cheating due to item compromise.Psychometrika,78,481497.CrossRefGoogle ScholarPubMed
Sinharay, S., &Johnson, M. S.(2020).The use of item scores and response times to detect examinees who may have benefited from item preknowledge.British Journal of Mathematical and Statistical Psychology,73,397419.CrossRefGoogle ScholarPubMed
Tendeiro, J. N.,Meijer, R. R.,Schakel, L., &Maij-de Meij, A. M.(2013).Using cumulative sum statistics to detect inconsistencies in unproctored internet testing.Educational and Psychological Measurement,73,143161.CrossRefGoogle Scholar
van der Linden, W. J.(2006).A lognormal model for response times on test items.Journal of Educational and Behavioral Statistics,31,181204.CrossRefGoogle Scholar
van der Linden, W. J.(2007).A hierarchical framework for modeling speed and accuracy on test items.Psychometrika,72,287308.CrossRefGoogle Scholar
van der Linden, W. J., &Guo, F.(2008).Bayesian procedures for identifying aberrant response-time patterns in adaptive testing.Psychometrika,73(3),365384.CrossRefGoogle Scholar
van Krimpen-Stoop, E. M. L. A., &Meijer, R. R.(2001).CUSUM-based person-fit statistics for adaptive testing.Journal of Educational and Behavioral Statistics,26,199218.CrossRefGoogle Scholar
Veerkamp, W. J. J., &Glas, C. A. W.(2000).Detection of known items in adaptive testing with a statistical quality control method.Journal of Educational and Behavioral Statistics,25,373389.CrossRefGoogle Scholar
Wang, X., &Liu, Y.(2020).Detecting compromised items using information from secure items.Journal of Educational and Behavioral Statistics,45,667689.CrossRefGoogle Scholar
Wells, C. S.,Subkoviak, M. J., &Serlin, R. C.(2002).The effect of item parameter drift on examinee ability estimates.Applied Psychological Measurement,26,7787.CrossRefGoogle Scholar
Wilks, S. S.(1938).The large-sample distribution of the likelihood ratio for testing composite hypotheses.The Annals of Mathematical Statistics,9,6062.CrossRefGoogle Scholar
Woodall, W. H., &Ncube, M. M.(1985).Multivariate CUSUM quality control procedures.Technometrics,27,285292.CrossRefGoogle Scholar
Yang, Y., Ferdous, A., & Chin, T. Y. (2007). Exposed items detection in personnel selection assessment: An exploration of new item statistic. Chicago, IL: Paper presented at the annual meeting of the National Council of Measurement in Education.Google Scholar
Zhang, J.(2014).A sequential procedure for detecting compromised items in the item pool of CAT system.Applied Psychological Measurement,38,87104.CrossRefGoogle Scholar
Zhang, J., &Li, J.(2016).Monitoring items in real time to enhance CAT security.Journal of Educational Measurement,53,131151.CrossRefGoogle Scholar
Zhang, J.,Li, Z., &Wang, Z.(2010).A multivariate control chart for simultaneously monitoring process mean and variability.Computational Statistics and Data Analysis,54,22442252.CrossRefGoogle Scholar
Zopluoglu, C.(2019).Detecting examinees with item Preknowledge in large-scale testing using extreme gradient boosting (XGBoost).Educational and Psychological Measurement,79,931961.CrossRefGoogle ScholarPubMed