Generating Items During Testing: Psychometric Issues and Models

Susan E. Embretson

doi:10.1007/BF02294564

Generating Items During Testing: Psychometric Issues and Models

Published online by Cambridge University Press: 01 January 2025

Susan E. Embretson

Show author details

Susan E. Embretson*: Affiliation:
University of Kansas
*: Requests for reprints should be sent to Susan E. Embretson, Department of Psychology, University of Kansas, Lawrence, Kansas.

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

On-line item generation is becoming increasingly feasible for many cognitive tests. Item generation seemingly conflicts with the well established principle of measuring persons from items with known psychometric properties. This paper examines psychometric principles and models required for measurement from on-line item generation. Three psychometric issues are elaborated for item generation. First, design principles to generate items are considered. A cognitive design system approach is elaborated and then illustrated with an application to a test of abstract reasoning. Second, psychometric models for calibrating generating principles, rather than specific items, are required. Existing item response theory (IRT) models are reviewed and a new IRT model that includes the impact on item discrimination, as well as difficulty, is developed. Third, the impact of item parameter uncertainty on person estimates is considered. Results from both fixed content and adaptive testing are presented.

Keywords

item generation test design IRT models cognitive models ability estimation item uncertainty

Type: Original Paper
Information: Psychometrika , Volume 64 , Issue 4 , December 1999 , pp. 407 - 433

DOI: https://doi.org/10.1007/BF02294564 [Opens in a new window]
Copyright: Copyright © 1999 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This article is based on the Presidential Address Susan E. Embretson gave on June 26, 1999 at the 1999 Annual Meeting of the Psychometric Society held at the University of Kansas in Lawrence, Kansas. —Editor

References

Adams, R. A., Wilson, M., & Wang, W-C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1–24.CrossRef Google Scholar

Bejar, I.I. (1990). A generative analysis of a three-dimensional spatial task. Applied Psychological Measurement, 14, 237–246.CrossRef Google Scholar

Bejar, I. I. (1996). Generative response modeling: Leveraging the computer as a test delivery medium, Princeton, NJ: ETS.Google Scholar

Bejar, I. I., & Yocom, P. (1991). A generative approach to the modeling of isomorphic hidden-figure items. Applied Psychological Measurement, 15, 129–137.CrossRef Google Scholar

Binet, A., & Simon, T. (1905). New methods for the diagnosis of the intellectual level of subnormals. L'Annee psychologique, 11, 245–336.CrossRef Google Scholar

Binet, A., & Simon, T. (1915). A method of measuring the development of the intelligence of young children 3rd ed.,, Chicago: Chicago Medical Book.CrossRef Google Scholar

Carpenter, P.A., Just, M.A., & Shell, P. (1990). What one intelligence test measures: A theoretical account of processing in the Raven's Progressive Matrices Test. Psychological Review, 97, 404–431.CrossRef Google Scholar

DiBello, L. V., Stout, W. F., & Roussos, L. (1995). Unified cognitive psychometric assessment likelihood-based classification techniques. In Nichols, P. D., Chipman, S. F., & Brennan, R. L. (Eds.), Cognitively diagnostic assessment, Hillsdale, NJ: Erlbaum Publishers.Google Scholar

Diehl, K. A. (1998). Using cognitive theory and item response theory to extract information from wrong responses, Lawrence, Kansas: University of Kansas.Google Scholar

Embretson, S.E. (1983). Construct validity: Construct representation versus nomothetic span. Psychological Bulletin, 93, 179–197.Google Scholar

Embretson, S. E. (1984). A general multicomponent latent trait model for response processes. Psychometrika, 49, 175–186.CrossRef Google Scholar

Embretson, S.E. (1985). Test design: Developments in psychology and psychometrics, New York: Academic Press.Google Scholar

Embretson, S. E. (1991). A multidimensional latent trait model for measuring learning and change. Psychometrika, 56, 495–516.CrossRef Google Scholar

Embretson, S.E. (1994). Application of cognitive design systems to test development. In Reynolds, C.R. (Eds.), Cognitive assessment: A multidisciplinary perspective (pp. 107–135). New York: Plenum Press.CrossRef Google Scholar

Embretson, S. E. (1995). A measurement model for linking individual change to processes and knowledge: Application to mathematical learning. Journal of Educational Measurement, 32, 277–294.CrossRef Google Scholar

Embretson, S.E. (1995). Developments toward a cognitive design system for psychological tests. In Lupinsky, D., & Dawis, R. (Eds.), Assessing individual differences in human behavior (pp. 17–48). Palo Alto, CA: Davies-Black Publishing.Google Scholar

Embretson, S.E. (1995). The role of working memory capacity and general control processes in intelligence. Intelligence, 20, 169–190.CrossRef Google Scholar

Embretson, S. E. (1997). Multicomponent latent trait models. In van der Linden, W., & Hambleton, R. (Eds.), Handbook of modern item response theory (pp. 305–322). New York: Springer-Verlag.CrossRef Google Scholar

Embretson, S. E. (1998). A cognitive design system approach to generating valid tests: Application to abstract reasoning. Psychological Methods, 3, 380–396.CrossRef Google Scholar

Embretson, S. E. (in press). Generating abstract reasoning items with cognitive theory. In Irvine, S. & Kyllonen, P. (Eds.), Item generation for test development. Mahwah, NJ: Erlbaum Publishers.Google Scholar

Embretson, S. E., Schneider, L. M. (1989). Cognitive models of analogical reasoning for psychometric tasks. Learning and Individual Differences, 1, 155–178.CrossRef Google Scholar

Embretson, S. E., Schneider, L. M., & Roth, D. L. (1985). Multiple processing strategies and the construct validity of verbal reasoning tests. Journal of Educational Measurement, 23, 13–32.CrossRef Google Scholar

Embretson, S. E., & Wetzel, D. (1987). Component latent trait models for paragraph comprehension tests. Applied Psychological Measurement, 11, 175–193.CrossRef Google Scholar

Fischer, G. H. (1973). Linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359–374.CrossRef Google Scholar

Goeters, K. M. & Lorenz, B. (in press). On the implementation of item generation principles in the design of aptitude testing in aviation. In Irvine, S. & Kyllonen, P. (Eds.), Item generation for test development. Mahwah, NJ: Erlbaum Publishers.Google Scholar

Gulliksen, H. (1950). Theory of mental tests, New York: Wiley.CrossRef Google Scholar

Guttman, L. (1969). Integration of test design and analysis, Princeton, NJ: Educational Testing Service.Google Scholar

Hively, W., Patterson, H. L., & Page, S. (1968). A “universe-defined” system of arithmetic achievement tests. Journal of Educational Measurement, 5, 275–290.CrossRef Google Scholar

Hornke, L.F., Habon, M.W. (1986). Rule-based item bank construction and evaluation within the linear logistic framework. Applied Psychological Measurement, 10, 369–380.CrossRef Google Scholar

Irvine, S. H., Dunn, P. L., & Anderson, J. D. (1989). Towards a theory of algorithm-determined cognitive test construction, Devon, UK: Polytechnic South West.Google Scholar

Irvine, S., & Kyllonen, P. (Eds.). (in press). Item generation for test development. Mahwah, NJ: Erlbaum Publishers.CrossRef Google Scholar

Kyllonen, P. (1993). Aptitude testing inspired by information processing: A test of the four-sources model. Journal of General Psychology, 120, 375–405.CrossRef Google Scholar

Little, J. R. A., & Rubin, D. B. (1987). Statistical analysis with missing data, New York: Wiley.Google Scholar

Mislevy, R. J. (1988). Exploiting auxiliary information about items in the estimation of Rasch item difficulty parameters. Applied Psychological Measurement, 12, 281–296.CrossRef Google Scholar

Mislevy, R. J., Sheehan, K. M., & Wingersky, M. (1993). How to equate tests with little or no data. Journal of Educational Measurement, 30, 55–76.CrossRef Google Scholar

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests, Chicago, IL: University of Chicago Press.Google Scholar

Raven, J. C. (1938). Progressive matrices: A perceptual test of intelligence, London: Lewis.Google Scholar

Raven, J.C., Court, J.H., & Raven, J. (1992). Manual for Raven's Progressive Matrices and Vocabulary Scale, San Antonio, TX: The Psychological Corporation.Google Scholar

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys, New York: Wiley.CrossRef Google Scholar

Shye, S., Elizur, D., & Hoffman, M. (1994). Introduction to facet theory, Thousand Oaks, CA: Sage Publishers.CrossRef Google Scholar

Spearman, C. (1913). Correlations of sums and differences. British Journal of Psychology, 5, 417–426.Google Scholar

Spearman, C. (1927). The abilities of man: Their nature and measurement, London: MacMillan.Google Scholar

Sternberg, R. J. (1985). Beyond IQ: A triarchic theory of human intelligence, New York: Cambridge University Press.Google Scholar

Tsutakawa, R. K., & Johnson, J. (1990). The effect of uncertainty on item parameter estimation on ability estimates. Psychometrika, 55, 371–390.CrossRef Google Scholar

Tsutakawa, R. K., & Soltys, M. J. (1988). Approximation for Bayesian ability estimation. Journal of Educational Statistics, 13, 117–130.CrossRef Google Scholar

Whitely, S. E. (1980). Multicomponent latent trait models for ability tests. Psychometrika, 45, 479–494.CrossRef Google Scholar

Whitely, S. E., & Schneider, L. M. (1981). Information structure on geometric analogies: A test theory approach. Applied Psychological Measurement, 5, 383–397.CrossRef Google Scholar

Article contents

Generating Items During Testing: Psychometric Issues and Models

Abstract

Keywords

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests