Accurate Assessment via Process Data

Susu Zhang; Zhi Wang; Jitong Qi; Jingchen Liu; Zhiliang Ying

doi:10.1007/s11336-022-09880-8

Accurate Assessment via Process Data

Published online by Cambridge University Press: 01 January 2025

Susu Zhang ,

Zhi Wang ,

Jitong Qi ,

Jingchen Liu

and

Zhiliang Ying

Show author details

Susu Zhang: Affiliation:
University of Illinois at Urbana-Champaign
Zhi Wang: Affiliation:
Citadel Securities
Jitong Qi: Affiliation:
Columbia University
Jingchen Liu*: Affiliation:
Columbia University
Zhiliang Ying: Affiliation:
Columbia University
*: Correspondence should be made to Jingchen Liu, Columbia University, New York, NY, USA. Email: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Accurate assessment of a student’s ability is the key task of a test. Assessments based on final responses are the standard. As the infrastructure advances, substantially more information is observed. One of such instances is the process data that is collected by computer-based interactive items and contain a student’s detailed interactive processes. In this paper, we show both theoretically and with simulated and empirical data that appropriately including such information in the assessment will substantially improve relevant assessment precision.

Keywords

Process data ability estimation automated scoring Rao–Blackwellization

Type: Theory and Methods
Information: Psychometrika , Volume 88 , Issue 1 , March 2023 , pp. 76 - 97

DOI: https://doi.org/10.1007/s11336-022-09880-8 [Opens in a new window]
Copyright: Copyright © 2022 The Author(s) under exclusive licence to The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

AERA, APA, and NCME. (2014). Standards for educational and psychological testing. American Educational Research Association American Psychological Association.Google Scholar

Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater V.2. The Journal of Technology, Learning and Assessment, 4(3). Retrieved from https://ejournals.bc.edu/index.php/jtla/article/view/1650 Google Scholar

Bejar, I. I., Mislevy, R. J., & Zhang, M. (2016). Automated scoring with validity in mind. In Rupp, A. A. & Leighton, J. P. (Eds.), The Wiley handbook of cognition and assessment (pp. 226–246). https://doi.org/10.1002/9781118956588.ch10 CrossRef Google Scholar

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In Lord, F. M. & Novick, M. R. (Eds.), Statistical theories of mental test scores (pp. 397–479). Addison-Wesley.Google Scholar

Blackwell, D. (1947). Conditional expectation and unbiased sequential estimation. The Annals of Mathematical Statistics, 18(1), 105–110.CrossRef Google Scholar

Bolsinova, M., Tijmstra, J., (2018). Improving precision of ability estimation: Getting more from response times British Journal of Mathematical and Statistical Psychology 71(1) 13–38 10.1111/bmsp.12104 28635139CrossRef Google Scholar PubMed

Casella, G., & Berger, R. L. (2002). Statistical inference (Vol. 2). Duxbury.Google Scholar

Clauser, B. E., Harik, P., Clyman, S. G., (2000). The generalizability of scores for a performance assessment scored with a computer-automated scoring system Journal of Educational Measurement 37(3) 245–261 10.1111/j.1745-3984.2000.tb01085.xCrossRef Google Scholar

Evanini, K., Heilman, M., Wang, X., Blanchard, D., (2015). Automated scoring for the toefl junior® comprehensive writing and speaking test ETS Research Report Series 2015(1) 1–11 10.1002/ets2.12052CrossRef Google Scholar

Fife, J. H., Automated scoring of mathematics tasks in the common core era: Enhancements to m-rater in support of

{cbal}^{TM}

\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\text{cbal}^{{\rm TM}}$$\end{document}

mathematics and the common core assessments ETS research report series 2013(2) i–35 10.1002/j.2333-8504.2013.tb02333.xGoogle Scholar

Foltz, P. W., Laham, D., & Landauer, T. K. (1999). Automated essay scoring: Applications to educational technology. In Collis, B. & Oliver, R. (Eds.), Proceedings of EdMedia + Innovate Learning 1999 (pp. 939–944). Association for the Advancement of Computing in Education (AACE).Google Scholar

Frey, A., Spoden, C., Goldhammer, F., Wenzel, S. F. C., (2013). Response time-based treatment of omitted responses in computer-based testing Behaviormetrika (2018). 45(2) 505–526 10.1007/s41237-018-0073-9CrossRef Google Scholar

He, Q., Veldkamp, B. P., Glas, C. A., de Vries, T., (2017). Automated assessment of patients’ self-narratives for posttraumatic stress disorder screening using natural language processing and text mining Assessment 24(2) 157–172 10.1177/1073191115602551 26358713CrossRef Google Scholar PubMed

He, Q., Veldkamp, B. P., Glas, C. A., Van Den Berg, S. M., (2019). Combining text mining of long constructed responses and item-based measures: A hybrid test design to screen for posttraumatic stress disorder (ptsd) Frontiers in Psychology 10 2358 10.3389/fpsyg.2019.02358 31695647 6817621CrossRef Google Scholar

He, Q., & von Davier, M. (2016). Analyzing process data from problem-solving items with N-grams: Insights from a computer-based large-scale assessment. In Y. Rosen, S. Ferrara, & M. Mosharraf (Eds.), Handbook of research on technology tools for real-world skill development (pp. 750–777). IGI Global. https://doi.org/10.4018/978-1-4666-9441-5.ch029 CrossRef Google Scholar

Hoerl, A. E., Kennard, R. W., (1970). Ridge regression: Biased estimation for nonorthogonal problems Technometrics 12(1) 55–67 10.1080/00401706.1970.10488634CrossRef Google Scholar

Kendall, M. G., (1938). A new measure of rank correlation Biometrika 30 1/2 81–93 10.2307/2332226CrossRef Google Scholar

Kim, J. K., Nicewander, W. A., (1993). Ability estimation for conventional tests Psychometrika 58(4) 587–599 10.1007/BF02294829CrossRef Google Scholar

LaMar, M. M., (2018). Markov decision process measurement model Psychometrika 83(1) 67–88 10.1007/s11336-017-9570-0 28447309CrossRef Google Scholar PubMed

Lehmann, E. L., & Romano, J. P. (2005). Testing statistical hypotheses (3rd ed.). Springer.Google Scholar

Liu, H., Liu, Y., Li, M., (2018). Analysis of process data of pisa 2012 computer-based problem solving: Application of the modified multilevel mixture irt model Frontiers in Psychology 9 1372 10.3389/fpsyg.2018.01372 30123171 6085588CrossRef Google Scholar PubMed

Lord, F. M. (1980). Applications of item response theory to practical testing problems. Routledge.Google Scholar

Messick, S., (1989). Meaning and values in test validation: The science and ethics of assessment Educational Researcher 18(2) 5–11 10.3102/0013189X018002005CrossRef Google Scholar

Muraki, E., (1992). A generalized partial credit model: Application of an em algorithm ETS Research Report Series 1992(1) i–30CrossRef Google Scholar

OECD. (2012). Literacy, numeracy and problem solving in technology-rich environments: Framework for the oecd survey of adult skills. OECD Publishing.CrossRef Google Scholar

Page, E. B., (1966). The imminence of grading essays by computer The Phi Delta Kappan 47(5) 238–243Google Scholar

Qiao, X., Jiao, H., (2018). Data mining techniques in analyzing process data: A didactic Frontiers in Psychology 9 2231 10.3389/fpsyg.2018.02231 30532716 6265513CrossRef Google Scholar

Rasch, G. (1960). Probabilistic models for some intelligence and achievement tests. Danish Institute for Educational Research.Google Scholar

Rose, N., von Davier, M., Nagengast, B., (2017). Modeling omitted and not-reached items in irt models Psychometrika 82(3) 795–819 10.1007/s11336-016-9544-7CrossRef Google Scholar

Rudner, L. M., Garcia, V., & Welch, C. (2006). An evaluation of IntelliMetric

^{TM}

essay scoring system. The Journal of Technology, Learning and Assessment, 4(4). Retrieved from https://ejournals.bc.edu/index.php/jtla/article/view/1651 Google Scholar

Rupp, A. A., (2018). Designing, evaluating, and deploying automated scoring systems with validity in mind: Methodological design decisions Applied Measurement in Education 31(3) 191–214 10.1080/08957347.2018.1464448CrossRef Google Scholar

Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. Guilford Press.Google Scholar

Schleicher, A., (2008). Piaac: A new strategy for assessing adult competencies International Review of Education 54 5–6 627–650 10.1007/s11159-008-9105-0CrossRef Google Scholar

Tang, X., Wang, Z., Liu, J., & Ying, Z. (2021a). An exploratory analysis of the latent structure of process data via action sequence autoencoders. British Journal of Mathematical and Statistical Psychology, 74(1), 1–33.CrossRef Google Scholar

Tang, X., Zhang, S., Wang, Z., Liu, J., & Ying, Z. (2021b). Procdata: An R package for process data analysis. Psychometrika, 86(4), 1058–1083.CrossRef Google Scholar

Tang, X., Wang, Z., He, Q., Liu, J., & Ying, Z. (2020). Latent feature extraction for process data via multidimensional scaling. Psychometrika, 85(2), 378–397.CrossRef Google Scholar

Tikhonov, A. N. & Arsenin, V. Y. (1977). Solutions of ill-posed problems (pp. 1–30). New York.Google Scholar

Ulitzsch, E., von Davier, M., Pohl, S., (2020). A hierarchical latent response model for inferences about examinee engagement in terms of guessing and item-level non-response British Journal of Mathematical and Statistical Psychology 73 83–112 10.1111/bmsp.12188 31709521CrossRef Google Scholar PubMed

van der Linden, W. J., (2007). A hierarchical framework for modeling speed and accuracy on test items Psychometrika 72(3) 287 10.1007/s11336-006-1478-zCrossRef Google Scholar

von Davier, M., Sinharay, S., Oranje, A., Beaton, A., (2006). 32 the statistical procedures used in national assessment of educational progress: Recent developments and future directions Handbook of Satistics 26 1039–1055 10.1016/S0169-7161(06)26032-2CrossRef Google Scholar

Wainer, H., Dorans, N. J., Flaugher, R., Green, B. F., & Mislevy, R. J. (2000). Computerized adaptive testing: A primer. Routledge.CrossRef Google Scholar

Xu, H., Fang, G., Chen, Y., Liu, J., & Ying, Z. (2018). Latent class analysis of recurrent events in problem-solving items. Applied Psychological Measurement, 0146621617748325.CrossRef Google Scholar

Zumbo, B. D., & Hubley, A. M. (2017). Understanding and investigating response processes in validation research (Vol 26). Springer.Google Scholar

Article contents

Accurate Assessment via Process Data

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests