Bayesian Model Assessment for Jointly Modeling Multidimensional Response Data with Application to Computerized Testing

Fang Liu; Xiaojing Wang; Roeland Hancock; Ming-Hui Chen

doi:10.1007/s11336-022-09845-x

Bayesian Model Assessment for Jointly Modeling Multidimensional Response Data with Application to Computerized Testing

Published online by Cambridge University Press: 01 January 2025

Fang Liu ,

Xiaojing Wang

Roeland Hancock and

Ming-Hui Chen

Show author details

Fang Liu: Affiliation:
Northeast Normal University
Xiaojing Wang*: Affiliation:
University of Connecticut
Roeland Hancock: Affiliation:
University of Connecticut
Ming-Hui Chen: Affiliation:
University of Connecticut
*: Correspondence should be made to Xiaojing Wang, University of Connecticut, Storrs, CT 06250, USA. Email: [email protected]; URL: https://xiaojing-wang.uconn.edu

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Computerized assessment provides rich multidimensional data including trial-by-trial accuracy and response time (RT) measures. A key question in modeling this type of data is how to incorporate RT data, for example, in aid of ability estimation in item response theory (IRT) models. To address this, we propose a joint model consisting of a two-parameter IRT model for the dichotomous item response data, a log-normal model for the continuous RT data, and a normal model for corresponding paper-and-pencil scores. Then, we reformulate and reparameterize the model to capture the relationship between the model parameters, to facilitate the prior specification, and to make the Bayesian computation more efficient. Further, we propose several new model assessment criteria based on the decomposition of deviance information criterion (DIC) the logarithm of the pseudo-marginal likelihood (LPML). The proposed criteria can quantify the improvement in the fit of one part of the multidimensional data given the other parts. Finally, we have conducted several simulation studies to examine the empirical performance of the proposed model assessment criteria and have illustrated the application of these criteria using a real dataset from a computerized educational assessment program.

Keywords

computerized tests DIC decomposition IRT models LPML decomposition paper-and-pencil tests response times

Type: Theory and Methods
Information: Psychometrika , Volume 87 , Issue 4 , December 2022 , pp. 1290 - 1317

DOI: https://doi.org/10.1007/s11336-022-09845-x [Opens in a new window]
Copyright: Copyright © 2022 The Author(s) under exclusive licence to The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/S0033312300005470a.

References

Bolsinova, M., de Boeck, P., Tijmstra, J., (2017). Modelling conditional dependence between response time and accuracy Psychometrika 82 (4) 1126–1148 27738955 10.1007/s11336-016-9537-6CrossRef Google Scholar PubMed

Bolt, D. M., Wollack, J. A., Suh, Y., (2012). Application of a multidimensional nested logit model to multiple-choice test items Psychometrika 77 (2) 339–357 10.1007/s11336-012-9257-5CrossRef Google Scholar

Celeux, G., Forbes, F., Robert, C. P., Titterington, D. M., (2006). Deviance information criteria for missing data models Bayesian Analysis 1 (4) 651–673 10.1214/06-BA122CrossRef Google Scholar

Chan, J. C., & Grant, A. L. (2016). Fast computation of the deviance information criterion for latent variable models. Computational Statistics and Data Analysis, 100, 847–859.CrossRef Google Scholar

Chen, G., Luo, S., (2018). Bayesian hierarchical joint modeling using skew-normal/independent distributions Communications in Statistics-Simulation and Computation 47 (5) 1420–1438 30174369 10.1080/03610918.2017.1315730CrossRef Google Scholar PubMed

Chen, M. H., Shao, Q. M., (1999). Monte Carlo estimation of Bayesian credible and HPD intervals Journal of Computational and Graphical Statistics 8 (1) 69–92CrossRef Google Scholar

Chen, M. H., Shao, Q. M., Ibrahim, J. G., Monte Carlo methods in Bayesian computation Berlin Springer 10.1007/978-1-4612-1276-8CrossRef Google Scholar

de la Torre, J., Patz, R. J., (2000). Making the most of what we have: A practical application of multidimensional item response theory in test scoring Journal of Educational and Behavioral Statistics (2005). 30 (3) 295–311 10.3102/10769986030003295CrossRef Google Scholar

de Valpine, P., Paciorek, C., Turek, D., Michaud, N., Anderson-Bergman, C., Obermeyer, F. & Paganin, S. (2020). NIMBLE: MCMC, particle filtering, and programmable hierarchical modeling. https://doi.org/10.5281/zenodo.1211190 CrossRef Google Scholar

de Valpine, P., Turek, D., Paciorek, C. J., Anderson-Bergman, C., Lang, D. T., Bodik, R., (2017). Programming with models: Writing statistical algorithms for general model structures with NIMBLE Journal of Computational and Graphical Statistics 26 (2) 403–413 10.1080/10618600.2016.1172487CrossRef Google Scholar

Donkin, C., Averell, L., Brown, S., Heathcote, A., (2009). Getting more from accuracy and response time data: Methods for fitting the linear ballistic accumulator Behavior Research Methods 41 (4) 1095–1110 19897817 10.3758/BRM.41.4.1095CrossRef Google Scholar PubMed

Entink, R. K., Fox, J. P., van der Linden, W. J., (2009). A multivariate multilevel approach to the modeling of accuracy and speed of test takers Psychometrika 74 (1) 21–48 10.1007/s11336-008-9075-yCrossRef Google Scholar

Fox, J. P., Bayesian item response modeling: Theory and applications Berlin Springer 10.1007/978-1-4419-0742-4CrossRef Google Scholar

Fox, J. P., Marianti, S., (2010). Joint modeling of ability and differential speed using responses and response times Multivariate Behavioral Research (2016). 51 (4) 540–553 27269482 10.1080/00273171.2016.1171128CrossRef Google Scholar

Fujimoto, K. A., (2018). A general Bayesian multilevel multidimensional IRT model for locally dependent data British Journal of Mathematical and Statistical Psychology 71 (3) 536–560 29882212 10.1111/bmsp.12133CrossRef Google Scholar PubMed

Geisser, S., Eddy, W. F., (1979). A predictive approach to model selection Journal of the American Statistical Association 74 (365) 153–160 10.1080/01621459.1979.10481632CrossRef Google Scholar

Gelfand, A. E., Dey, D. K., (1994). Bayesian model choice: Asymptotics and exact calculations Journal of the Royal Statistical Society: Series B 56 (3) 501–514CrossRef Google Scholar

Gelfand, A. E., Dey, D. K., & Chang, H. (1992). Model determination using predictive distributions with implementation via sampling-based-methods (with discussion). In A. P. D. J.M. Bernado J.O. Berger & A. Smith (eds), In bayesian statistics 4. Oxford: Oxford University Press.Google Scholar

Gilbert, J. K., Compton, D. L., Fuchs, D., Fuchs, L. S., (2012). Early screening for risk of reading disabilities: Recommendations for a four-step screening system Assessment for Effective Intervention 38 (1) 6–14 24478613 3903290 10.1177/1534508412451491CrossRef Google Scholar PubMed

Ibrahim, J. G., Chen, M. H., Sinha, D Bayesian survival analysis Berlin Springer 10.1007/978-1-4757-3447-8Google Scholar

Jeffreys, H The theory of probability 3 Oxford, UK Oxford University PressGoogle Scholar

Johnson, T. R., (2003). On the use of heterogeneous thresholds ordinal regression models to account for individual differences in response style Psychometrika 68 (4) 563–583 10.1007/BF02295612CrossRef Google Scholar

Karadavut, T., (2019). The uniform prior for Bayesian estimation of ability in item response theory models International Journal of Assessment Tools in Education 6 (4) 568–579 10.21449/ijate.581314CrossRef Google Scholar

Kass, R. E., Raftery, A. E., (1995). Bayes factors Journal of the American Statistical Association 90 (430) 773–795 10.1080/01621459.1995.10476572CrossRef Google Scholar

Li, Y., Yu, J., Zeng, T., (2020). Deviance information criterion for latent variable models and misspecified models Journal of Econometrics 216 (2) 450–493 10.1016/j.jeconom.2019.11.002CrossRef Google Scholar

Lindley, D. V., Introduction to probability and statistics from a bayesian viewpoint Cambridge Cambridge University Press 10.1017/CBO9780511662973CrossRef Google Scholar

Loeys, T., Rosseel, Y., Baten, K., (2011). A joint modeling approach for reaction time and accuracy in psycholinguistic experiments Psychometrika 76 (3) 487–503 10.1007/s11336-011-9211-yCrossRef Google Scholar

Lu, J., Wang, C., Zhang, J., Tao, J., (2020). A mixture model for responses and response times with a higher-order ability structure to detect rapid guessing behaviour British Journal of Mathematical and Statistical Psychology 73 (2) 261–288 31385609 10.1111/bmsp.12175CrossRef Google Scholar PubMed

Luce, R. D. (1991). Response times: Their role in inferring elementary mental organization. Oxford: Oxford University Press.CrossRef Google Scholar

Man, K., Harring, J. R., Jiao, H., Zhan, P., (2019). Joint modeling of compensatory multidimensional item responses and response times Applied Psychological Measurement 43 (8) 639–654 31551641 6745633 10.1177/0146621618824853CrossRef Google Scholar PubMed

Merkle, E. C., Furr, D., Rabe-Hesketh, S., (2019). Bayesian comparison of latent variable models: Conditional versus marginal likelihoods Psychometrika 84 (3) 802–829 31297664 10.1007/s11336-019-09679-0CrossRef Google Scholar PubMed

Molenaar, D., de Boeck, P., (2018). Response mixture modeling: Accounting for heterogeneity in item characteristics across response times Psychometrika 83 (2) 279–297 29392567 10.1007/s11336-017-9602-9CrossRef Google Scholar PubMed

Rouder, J. N., Province, J. M., Morey, R. D., Gomez, P., Heathcote, A., (2015). The lognormal race: A cognitive-process model of choice and latency with desirable psychometric properties Psychometrika 80 (2) 491–513 24522340 10.1007/s11336-013-9396-3CrossRef Google Scholar

Spiegelhalter, D. J., Best, N. G., Carlin, B. P., Van Der Linde, A., (2002). Bayesian measures of model complexity and fit Journal of the Royal Statistical Society: Series B 64 (4) 583–639 10.1111/1467-9868.00353CrossRef Google Scholar

Torgesen, J. K., Wagner, R., & Rashotte, C. (2012). Test of word reading efficiency: (TOWRE-2). New York, NY: Pearson.Google Scholar

van der Linden, W. J., (2009). Conceptual issues in response-time modeling Journal of Educational Measurement 46 (3) 247–272 10.1111/j.1745-3984.2009.00080.xCrossRef Google Scholar

van der Linden, W. J., Handbook of item response theory, volume three: Applications Boca Raton Chapman and Hall/CRC 10.1201/b19166Google Scholar

van der Linden, W. J., Guo, F., (2008). Bayesian procedures for identifying aberrant response-time patterns in adaptive testing Psychometrika 73 (3) 365–384 10.1007/s11336-007-9046-8CrossRef Google Scholar

van der Linden, W. J., Hambleton, R. K., Handbook of modern item response theory Berlin SpringerCrossRef Google Scholar

Visual Numerics, I Imsl fortran library user’s guide math/library San Ramon, CA Visual Numerics IncGoogle Scholar

Wang, X., Saha, A., & Dey, D. K. (2016). Bayesian joint modeling of response times with dynamic latent ability in educational testing (Vol. 3; Tech. Rep.). Department of Statistics, University of Connecticut, Storrs, Connecticut, USAGoogle Scholar

Watanabe, S., (2010). Asymptotic equivalence of bayes cross validation and widely applicable information criterion in singular learning theory Journal of Machine Learning Research (2010). 11 3571–3594Google Scholar

Zhang, D., Chen, M. H., Ibrahim, J. G., Boye, M. E., Shen, W., (2017). Bayesian model assessment in joint modeling of longitudinal and survival data with applications to cancer clinical trials Journal of Computational and Graphical Statistics 26 (1) 121–133 28239247 5321618 10.1080/10618600.2015.1117472CrossRef Google Scholar PubMed

Zhang, F., Chen, M. H., Cong, X. J., Chen, Q., (2021). Assessing importance of biomarkers: A bayesian joint modelling approach of longitudinal and survival data with semi-competing risks Statistical Modelling 21 1–2 30–55 34326706 10.1177/1471082X20933363CrossRef Google Scholar

Zhang, X., Tao, J., Wang, C., Shi, N. Z., (2019). Bayesian model selection methods for multilevel IRT models: A comparison of five DIC-based indices Journal of Educational Measurement 56 (1) 3–27 10.1111/jedm.12197CrossRef Google Scholar

Liu et al. supplementary material

File 1.7 MB

Article contents

Bayesian Model Assessment for Jointly Modeling Multidimensional Response Data with Application to Computerized Testing

Abstract

Keywords

Access options

Footnotes

References

Liu et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests