Automated Item Generation with Recurrent Neural Networks

Matthias von Davier

doi:10.1007/s11336-018-9608-y

Automated Item Generation with Recurrent Neural Networks

Published online by Cambridge University Press: 01 January 2025

Matthias von Davier

Show author details

Matthias von Davier*: Affiliation:
National Board of Medical Examiners
*: Correspondence should be made to Matthias von Davier, National Board of Medical Examiners, 3750 Market Street, Philadelphia, PA 19104-3102, USA. Email: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Utilizing technology for automated item generation is not a new idea. However, test items used in commercial testing programs or in research are still predominantly written by humans, in most cases by content experts or professional item writers. Human experts are a limited resource and testing agencies incur high costs in the process of continuous renewal of item banks to sustain testing programs. Using algorithms instead holds the promise of providing unlimited resources for this crucial part of assessment development. The approach presented here deviates in several ways from previous attempts to solve this problem. In the past, automatic item generation relied either on generating clones of narrowly defined item types such as those found in language free intelligence tests (e.g., Raven’s progressive matrices) or on an extensive analysis of task components and derivation of schemata to produce items with pre-specified variability that are hoped to have predictable levels of difficulty. It is somewhat unlikely that researchers utilizing these previous approaches would look at the proposed approach with favor; however, recent applications of machine learning show success in solving tasks that seemed impossible for machines not too long ago. The proposed approach uses deep learning to implement probabilistic language models, not unlike what Google brain and Amazon Alexa use for language processing and generation.

Keywords

deep learning neural networks automatic item generation machine learning

Type: Original Paper
Information: Psychometrika , Volume 83 , Issue 4 , December 2018 , pp. 847 - 857

DOI: https://doi.org/10.1007/s11336-018-9608-y [Opens in a new window]
Copyright: Copyright © 2018 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E.,Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jozefowicz, R., Jia, Y., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Schuster, M., Monga, R., Moore, S., Murray, D., Olah, C., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y. & Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org (Google Research).Google Scholar

Bejar, I. I., Lawless, R., Morley, M. E., Wagner, M. E., Bennett, R. E. & Revuelta, J. (2003). A feasibility study of on-the-fly item generation in adaptive testing. Journal of Technology, Learning, and Assessment. https://www.uam.es/personal_pdi/psicologia/fjabad/cv/articulos/jlta/A_Feasibility_Study_of_On_the_Fly_Item_Generation_in_Adaptive_Tes%5B1%5D.pdf. Accessed 7 March 2018.Google Scholar

Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A. & Bengio, Y. (2015). A recurrent latent variable model for sequential data. arXiv:1506.02216v6 [cs.LG].Google Scholar

Cui, H., Wei, X. & Dai, M. (2010). Parallel implementation of expectation-maximization for fast convergence. In ACM proceedings. http://users.ece.cmu.edu/~hengganc/archive/report/final.pdf. Accessed 7 March 2018.Google Scholar

Cybenko, G(1989).Approximations by superpositions of sigmoidal functions.Mathematics of Control, Signals, and Systems,2(4).303–314.CrossRef Google Scholar

Dennis, J. E. & Schnabel, R. B. (1996). Numerical methods for unconstrained optimization and nonlinear equations. Classics in Applied Mathematics: Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611971200.CrossRef Google Scholar

Dreyfus, SE(1990).Artificial neural networks, back propagation, and the Kelley–Bryson gradient procedure.Journal of Guidance, Control, and Dynamics,13(5).926–928.CrossRef Google Scholar

Embretson, S. E.,Irvine, S. H., &Kyllonen, P. C.(2002).Generating abstract reasoning items with cognitive theory.Item generation for test development,Mahwah, NJ:Erlbaum 219250Google Scholar

Embretson, S. E.,Yang, X.,Rao, C. R., &Sinharay, S.(2007).Automatic item generation and cognitive psychology.Handbook of Statistics: Psychometrics,North Holland:Elsevier 747768Google Scholar

Gal, Y. & Ghahramani, Z. (2015). A theoretically grounded application of dropout in recurrent neural networks. Published in NIPS 2016. arXiv:1512.05287 Google Scholar

Gierl, M. J.,Lai, H.(2013).Using automated processes to generate test items.Educational Measurement: Issues and Practice,32,3650CrossRef Google Scholar

Gilula, Z., &Haberman, S. J.(1994).Models for analyzing categorical panel data.Journal of the American Statistical Association,89,645–656.CrossRef Google Scholar

Gilula, Z., &Haberman, S. J.(1995).Prediction functions for categorical panel data.The Annals of Statistics,23,1130–1142.CrossRef Google Scholar

Goldberg, L. R.,Mervielde, I.,Deary, I.,De Fruyt, F., &Ostendorf, F.(1999).A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models.Personality psychology in Europe,Tilburg:Tilburg University Press 7–28.Google Scholar

Goldberg, L. R.,Johnson, J. A.,Eber, H. W.,Hogan, R.,Ashton, M. C.,Cloninger, C. R., &Gough, H. C.(2006).The international personality item pool and the future of public-domain personality measures.Journal of Research in Personality,40,84–96.CrossRef Google Scholar

Goodfellow, I. Pouget-Abadie, J., Mirza, M., Xu, B. Warde-Farley, D., Ozair, S., Courville, A. & Bengio, J. (2014). Generative adversarial networks. arXiv:1406.2661.Google Scholar

Greff, K., Srivastava, R. K., Koutnik, J., Steunebrink, B. R. & Schmidhuber, J. (2015). LSTM: A search space odyssey. arXiv preprint arXiv:1503.04069.Google Scholar

Hochreiter, S., &Schmidhuber, J.(1997).Long short-term memory.Neural Computation,9(8).17351780.CrossRef Google Scholar PubMed

Hornik, K.(1991).Approximation capabilities of multilayer feedforward networks.Neural Networks,4(2).251–257.CrossRef Google Scholar

Jozefowicz, R., Vinyals, O., Schuster, M. Shazeer N., & Wu, Y. (2016). Exploring the limits of language modeling. arXiv:1602.02410v2.Google Scholar

Jozefowicz, R., Zaremba, W., Sutskever, I.(2015). An empirical exploration of recurrent network architectures.. In Proceedings of the 32nd international conference on machine learning, Lille, France, (37). JMLR:: W&CP.Google Scholar

Karpathy, A.(2015). The unreasonable effectiveness of RNNs. http://karpathy.Github.io/2015/05/21/rnn-effectiveness/. Accessed 7 March 2018.Google Scholar

Kingma, D., & Ba, J.(2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.Google Scholar

Mikolov, T.(2012). Statistical language models based on NNs.. Ph.D. thesis, Brno University of Technology,.Google Scholar

Ozair, S. (2016). Char-rnn for tensorflow.. https://github.com/sherjilozair/char-rnn-tensorflow. Accessed 7 March 2018.Google Scholar

Rammstedt, B., &John, O. P.(2007).Measuring personality in one minute or less: A 10-item short version of the big five inventory in English and German.Journal of Research in Personality,41,203–212.CrossRef Google Scholar

Rosenblatt, F.(1958).The perceptron: A probabilistic model for information storage and organization in the brain.Psychological Review,65(6).386–408.CrossRef Google Scholar PubMed

Rumelhart, D. E.,Hinton, G. E., &Williams, R. J.(1986).Learning internal representations by error propagation,Cambridge, MA:MIT pressGoogle Scholar

Savage, L.(1971).Elicitation of personal probabilities and expectations.Journal of the American Statistical Association,66(336).783–801.CrossRef Google Scholar

Schäfer, A. M.,Zimmermann, H. G.,Kollias, S. D.,Stafylopatis, A.,Duch, W., &Oja, E.(2006).Recurrent neural networks are universal approximators.Artificial neural networks— ICANN 2006. ICANN 2006. Lecture notes in computer science,Berlin:SpringerGoogle Scholar

Sundermeyer, M.,Ney, H., &Schlüter, R.(2015).From feedforward to recurrent LSTM NNs for language modeling.IEEE/ACM Transactions on Audio, Speech, and Language Processing,23(3).517–529.CrossRef Google Scholar

Trask, A., Gilmore, D., & Russell, M. (2015). Modeling order in neural word embeddings at scale. CoRR, abs/1506.02338, 2015,.arXiv:1506.02338.Google Scholar

von Davier, M.(2016).High-performance psychometrics: The parallel-E parallel-M algorithm for generalized latent variable models.ETS Research Report Series,2016,111.CrossRef Google Scholar

von Davier, M.(2017). New results on an improved parallel EM algorithm for estimating generalized latent variable models.In van der Ark, L. A., Wiberg, M., Culpepper, S. A.,Douglas, J. A., & Wang, W.-C. (Eds.) Quantitative psychology: Proceedings of the 81st annual meeting of the psychometric society, Asheville, North Carolina, 2016.(1–8).http://www.springer.com/us/book/9783319562933.Google Scholar

Article contents

Automated Item Generation with Recurrent Neural Networks

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests