Hostname: page-component-5f745c7db-q8b2h Total loading time: 0 Render date: 2025-01-06T06:38:15.921Z Has data issue: true hasContentIssue false

Accurate Assessment via Process Data

Published online by Cambridge University Press:  01 January 2025

Susu Zhang
Affiliation:
University of Illinois at Urbana-Champaign
Zhi Wang
Affiliation:
Citadel Securities
Jitong Qi
Affiliation:
Columbia University
Jingchen Liu*
Affiliation:
Columbia University
Zhiliang Ying
Affiliation:
Columbia University
*
Correspondence should be made to Jingchen Liu, Columbia University, New York, NY, USA. Email: [email protected]

Abstract

Accurate assessment of a student’s ability is the key task of a test. Assessments based on final responses are the standard. As the infrastructure advances, substantially more information is observed. One of such instances is the process data that is collected by computer-based interactive items and contain a student’s detailed interactive processes. In this paper, we show both theoretically and with simulated and empirical data that appropriately including such information in the assessment will substantially improve relevant assessment precision.

Type
Theory and Methods
Copyright
Copyright © 2022 The Author(s) under exclusive licence to The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

AERA, APA, and NCME. (2014). Standards for educational and psychological testing. American Educational Research Association American Psychological Association.Google Scholar
Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater V.2. The Journal of Technology, Learning and Assessment, 4(3). Retrieved from https://ejournals.bc.edu/index.php/jtla/article/view/1650Google Scholar
Bejar, I. I., Mislevy, R. J., & Zhang, M. (2016). Automated scoring with validity in mind. In Rupp, A. A. & Leighton, J. P. (Eds.), The Wiley handbook of cognition and assessment (pp. 226–246). https://doi.org/10.1002/9781118956588.ch10CrossRefGoogle Scholar
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In Lord, F. M. & Novick, M. R. (Eds.), Statistical theories of mental test scores (pp. 397–479). Addison-Wesley.Google Scholar
Blackwell, D. (1947). Conditional expectation and unbiased sequential estimation. The Annals of Mathematical Statistics, 18(1), 105–110.CrossRefGoogle Scholar
Bolsinova, M., Tijmstra, J., (2018). Improving precision of ability estimation: Getting more from response times British Journal of Mathematical and Statistical Psychology 71(1) 1338 10.1111/bmsp.12104 28635139CrossRefGoogle ScholarPubMed
Casella, G., & Berger, R. L. (2002). Statistical inference (Vol. 2). Duxbury.Google Scholar
Clauser, B. E., Harik, P., Clyman, S. G., (2000). The generalizability of scores for a performance assessment scored with a computer-automated scoring system Journal of Educational Measurement 37(3) 245261 10.1111/j.1745-3984.2000.tb01085.xCrossRefGoogle Scholar
Evanini, K., Heilman, M., Wang, X., Blanchard, D., (2015). Automated scoring for the toefl junior® comprehensive writing and speaking test ETS Research Report Series 2015(1) 111 10.1002/ets2.12052CrossRefGoogle Scholar
Fife, J. H., Automated scoring of mathematics tasks in the common core era: Enhancements to m-rater in support of cbal TM \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\text{cbal}^{{\rm TM}}$$\end{document} mathematics and the common core assessments ETS research report series 2013(2) i35 10.1002/j.2333-8504.2013.tb02333.xGoogle Scholar
Foltz, P. W., Laham, D., & Landauer, T. K. (1999). Automated essay scoring: Applications to educational technology. In Collis, B. & Oliver, R. (Eds.), Proceedings of EdMedia + Innovate Learning 1999 (pp. 939–944). Association for the Advancement of Computing in Education (AACE).Google Scholar
Frey, A., Spoden, C., Goldhammer, F., Wenzel, S. F. C., (2013). Response time-based treatment of omitted responses in computer-based testing Behaviormetrika (2018). 45(2) 505526 10.1007/s41237-018-0073-9CrossRefGoogle Scholar
He, Q., Veldkamp, B. P., Glas, C. A., de Vries, T., (2017). Automated assessment of patients’ self-narratives for posttraumatic stress disorder screening using natural language processing and text mining Assessment 24(2) 157172 10.1177/1073191115602551 26358713CrossRefGoogle ScholarPubMed
He, Q., Veldkamp, B. P., Glas, C. A., Van Den Berg, S. M., (2019). Combining text mining of long constructed responses and item-based measures: A hybrid test design to screen for posttraumatic stress disorder (ptsd) Frontiers in Psychology 10 2358 10.3389/fpsyg.2019.02358 31695647 6817621CrossRefGoogle Scholar
He, Q., & von Davier, M. (2016). Analyzing process data from problem-solving items with N-grams: Insights from a computer-based large-scale assessment. In Y. Rosen, S. Ferrara, & M. Mosharraf (Eds.), Handbook of research on technology tools for real-world skill development (pp. 750–777). IGI Global. https://doi.org/10.4018/978-1-4666-9441-5.ch029CrossRefGoogle Scholar
Hoerl, A. E., Kennard, R. W., (1970). Ridge regression: Biased estimation for nonorthogonal problems Technometrics 12(1) 5567 10.1080/00401706.1970.10488634CrossRefGoogle Scholar
Kendall, M. G., (1938). A new measure of rank correlation Biometrika 30 1/2 8193 10.2307/2332226CrossRefGoogle Scholar
Kim, J. K., Nicewander, W. A., (1993). Ability estimation for conventional tests Psychometrika 58(4) 587599 10.1007/BF02294829CrossRefGoogle Scholar
LaMar, M. M., (2018). Markov decision process measurement model Psychometrika 83(1) 6788 10.1007/s11336-017-9570-0 28447309CrossRefGoogle ScholarPubMed
Lehmann, E. L., & Romano, J. P. (2005). Testing statistical hypotheses (3rd ed.). Springer.Google Scholar
Liu, H., Liu, Y., Li, M., (2018). Analysis of process data of pisa 2012 computer-based problem solving: Application of the modified multilevel mixture irt model Frontiers in Psychology 9 1372 10.3389/fpsyg.2018.01372 30123171 6085588CrossRefGoogle ScholarPubMed
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Routledge.Google Scholar
Messick, S., (1989). Meaning and values in test validation: The science and ethics of assessment Educational Researcher 18(2) 511 10.3102/0013189X018002005CrossRefGoogle Scholar
Muraki, E., (1992). A generalized partial credit model: Application of an em algorithm ETS Research Report Series 1992(1) i30CrossRefGoogle Scholar
OECD. (2012). Literacy, numeracy and problem solving in technology-rich environments: Framework for the oecd survey of adult skills. OECD Publishing.CrossRefGoogle Scholar
Page, E. B., (1966). The imminence of grading essays by computer The Phi Delta Kappan 47(5) 238243Google Scholar
Qiao, X., Jiao, H., (2018). Data mining techniques in analyzing process data: A didactic Frontiers in Psychology 9 2231 10.3389/fpsyg.2018.02231 30532716 6265513CrossRefGoogle Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and achievement tests. Danish Institute for Educational Research.Google Scholar
Rose, N., von Davier, M., Nagengast, B., (2017). Modeling omitted and not-reached items in irt models Psychometrika 82(3) 795819 10.1007/s11336-016-9544-7CrossRefGoogle Scholar
Rudner, L. M., Garcia, V., & Welch, C. (2006). An evaluation of IntelliMetric TM \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$^{\rm TM}$$\end{document} essay scoring system. The Journal of Technology, Learning and Assessment, 4(4). Retrieved from https://ejournals.bc.edu/index.php/jtla/article/view/1651Google Scholar
Rupp, A. A., (2018). Designing, evaluating, and deploying automated scoring systems with validity in mind: Methodological design decisions Applied Measurement in Education 31(3) 191214 10.1080/08957347.2018.1464448CrossRefGoogle Scholar
Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. Guilford Press.Google Scholar
Schleicher, A., (2008). Piaac: A new strategy for assessing adult competencies International Review of Education 54 5–6 627650 10.1007/s11159-008-9105-0CrossRefGoogle Scholar
Tang, X., Wang, Z., Liu, J., & Ying, Z. (2021a). An exploratory analysis of the latent structure of process data via action sequence autoencoders. British Journal of Mathematical and Statistical Psychology, 74(1), 1–33.CrossRefGoogle Scholar
Tang, X., Zhang, S., Wang, Z., Liu, J., & Ying, Z. (2021b). Procdata: An R package for process data analysis. Psychometrika, 86(4), 1058–1083.CrossRefGoogle Scholar
Tang, X., Wang, Z., He, Q., Liu, J., & Ying, Z. (2020). Latent feature extraction for process data via multidimensional scaling. Psychometrika, 85(2), 378–397.CrossRefGoogle Scholar
Tikhonov, A. N. & Arsenin, V. Y. (1977). Solutions of ill-posed problems (pp. 1–30). New York.Google Scholar
Ulitzsch, E., von Davier, M., Pohl, S., (2020). A hierarchical latent response model for inferences about examinee engagement in terms of guessing and item-level non-response British Journal of Mathematical and Statistical Psychology 73 83112 10.1111/bmsp.12188 31709521CrossRefGoogle ScholarPubMed
van der Linden, W. J., (2007). A hierarchical framework for modeling speed and accuracy on test items Psychometrika 72(3) 287 10.1007/s11336-006-1478-zCrossRefGoogle Scholar
von Davier, M., Sinharay, S., Oranje, A., Beaton, A., (2006). 32 the statistical procedures used in national assessment of educational progress: Recent developments and future directions Handbook of Satistics 26 10391055 10.1016/S0169-7161(06)26032-2CrossRefGoogle Scholar
Wainer, H., Dorans, N. J., Flaugher, R., Green, B. F., & Mislevy, R. J. (2000). Computerized adaptive testing: A primer. Routledge.CrossRefGoogle Scholar
Xu, H., Fang, G., Chen, Y., Liu, J., & Ying, Z. (2018). Latent class analysis of recurrent events in problem-solving items. Applied Psychological Measurement, 0146621617748325.CrossRefGoogle Scholar
Zumbo, B. D., & Hubley, A. M. (2017). Understanding and investigating response processes in validation research (Vol 26). Springer.Google Scholar