Hostname: page-component-586b7cd67f-g8jcs Total loading time: 0 Render date: 2024-12-01T01:53:59.407Z Has data issue: false hasContentIssue false

Varying the Valuating Function and the Presentable Bank in Computerized Adaptive Testing

Published online by Cambridge University Press:  10 January 2013

Juan Ramón Barrada*
Affiliation:
Universidad Autónoma de Barcelona (Spain)
Francisco José Abad
Affiliation:
Universidad Autónoma de Madrid (Spain)
Julio Olea
Affiliation:
Universidad Autónoma de Barcelona (Spain)
*
Correspondence concerning this article should be addressed to Juan Ramón Barrada. Facultad de Psicología. Universidad Autónoma de Barcelona. 08193 Bellaterra. Barcelona. (Spain). Phone: +34-935813263. E-mail: [email protected]

Abstract

In computerized adaptive testing, the most commonly used valuating function is the Fisher information function. When the goal is to keep item bank security at a maximum, the valuating function that seems most convenient is the matching criterion, valuating the distance between the estimated trait level and the point where the maximum of the information function is located. Recently, it has been proposed not to keep the same valuating function constant for all the items in the test. In this study we expand the idea of combining the matching criterion with the Fisher information function. We also manipulate the number of strata into which the bank is divided. We find that the manipulation of the number of items administered with each function makes it possible to move from the pole of high accuracy and low security to the opposite pole. It is possible to greatly improve item bank security with much fewer losses in accuracy by selecting several items with the matching criterion. In general, it seems more appropriate not to stratify the bank.

En los tests adaptativos informatizados, la función de valoración más comúnmente empleada es la función de información de Fisher. Cuando el objetivo es mantener al máximo la seguridad del banco de ítems, la función de valoración que parece más adecuada es el criterio de proximidad, con el que se valora la distancia entre el nivel de rasgo estimado y el punto donde es máxima la información proporcionada por un ítem. Recientemente, se ha propuesto no mantener la misma regla de valoración constante a lo largo de todo el test. En este estudio, expandimos la idea de combinar el criterio de proximidad con la función de información de Fisher. También manipulamos el número de estratos en los que se divide el banco. Encontramos que la manipulación del número de ítems administrados con cada función hace posible moverse desde el extremo de alta precisión y baja seguridad hasta el extremo opuesto. La selección de varios ítems con el criterio de proximidad hace posible mejorar en gran medida la seguridad del banco con pérdidas escasas en precisión. En general, parece más adecuado no estratificar el banco.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abad, F. J., Olea, J., Aguado, D., Ponsoda, V., & Barrada, J. R. (2010). Deterioro de parámetros de los ítems en tests adaptativos informatizados: Estudio con eCAT [Item parameter drift in computerized adaptive testing: Study with eCAT]. Psicothema, 22, 340347.Google Scholar
ACT, Inc. (1997). ACT assessment technical manual. Iowa City, IA: Author.Google Scholar
Ban, J., Hanson, B. A., Wang, T., Yi, Q., & Harris, D. J. (2001). A comparative study of on-line pretest item-calibration/scaling methods in computerized adaptive testing. Journal of Educational Measurement, 38, 191212. doi:10.1111/j.1745-3984.2001.tb01123.xGoogle Scholar
Barrada, J. R., Abad, F. J., & Veldkamp, B. P. (2009). Comparison of methods for controlling maximum exposure rates in computerized adaptive testing. Psicothema, 21, 313320.Google ScholarPubMed
Barrada., J. R., Mazuela, P., & Olea, J. (2006). Maximum Information Stratification method for controlling item exposure in Computerized Adaptive Testing. Psicothema, 18, 156159.Google ScholarPubMed
Barrada, J. R., Olea, J., Ponsoda, V., & Abad, F. J. (2010). A method for the comparison of item selection rules in computerized adaptive testing. Applied Psychological Measurement, 34, 438452. doi:10.1177/0146621610370152Google Scholar
Barrada, J. R., Veldkamp, B. P., & Olea, J. (2009). Multiple maximum exposure rates in computerized adaptive testing. Applied Psychological Measurement, 33, 5873. doi:10.1177/0146621608315329CrossRefGoogle Scholar
Chang, H. H. (2004). Understanding computerized adaptive testing – From Robbins-Monro to Lord and beyond. In Kaplan, David (Ed.) The SAGE handbook of quantitative methodology for the social sciences (pp. 117133). Thousand Oaks, CA: Sage Publications.Google Scholar
Chang, H. H., Qian, J., & Ying, Z. (2001). a-stratified multistage computerized adaptive testing with b blocking. Applied Psychological Measurement, 25, 333341. doi:10.1177/01466210122032181Google Scholar
Chang, H. H., & Ying, Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20, 213229. doi:10.1177/014662169602000303CrossRefGoogle Scholar
Chang, H. H., & Ying, Z. (1999). a-Stratified multistage computerized adaptive testing. Applied Psychological Measurement, 23, 211222. doi:10.1177/01466219922031338Google Scholar
Chang, H. H., & Zhang, J. (2002). Hypergeometric family and item overlap rates in computerized adaptive testing. Psychometrika, 67, 387398. doi:10.1007/BF02294991Google Scholar
Chang, S. W., & Ansley, T. N. (2003). A comparative study of item exposure control methods in computerized adaptive testing. Journal of Educational Measurement, 40, 71103. doi:10.1111/j.1745-3984.2003.tb01097.xCrossRefGoogle Scholar
Chen, S. Y., Ankenmann, R. D., & Spray, J. A. (2003). The relationship between item exposure and test overlap in computerized adaptive testing. Journal of Educational Measurement, 40, 129145.doi:10.1111/j.1745-3984.2003.tb01100.xGoogle Scholar
Dodd, B. G. (1990) The effect of item selection procedure and stepsize on computerized adaptive attitude measurement using the rating scale model. Applied Psychological Measurement, 14, 355366. doi:10.1177/014662169001400403Google Scholar
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Hingham, MA: Kluwer.Google Scholar
Hanson, B. A. (2002). IRT Command Language. Computer software manual. Retrieved from: http://www.b-a-h.com/software/irt/icl/icl_manual.pdfGoogle Scholar
Hau, K. T., & Chang, H. H. (2001). Item selection in computerized adaptive testing: Should more discriminating items be used first? Journal of Educational Measurement, 38, 249266. doi:10.1111/j.1745-3984.2001.tb01126.xGoogle Scholar
Leung, C. K., Chang, H. H., & Hau, K. T. (2002). Item selection in computerized adaptive testing: Improving the a-stratified design with the Sympson-Hetter algorithm. Applied Psychological Measurement, 26, 376392. doi:10.1177/014662102237795CrossRefGoogle Scholar
Leung, C. K., Chang, H. H., & Hau, K. T. (2005). Computerized adaptive testing: a mixture item selection approach for constrained situations. British Journal of Mathematical and Statistical Psychology, 58, 239257. doi:10.1348/000711005X62945CrossRefGoogle ScholarPubMed
Li, Y. H., & Schafer, W. D. (2005). Increasing the homogeneity of CAT's item-exposure rates by minimizing or maximizing varied target functions while assembling shadow tests. Journal of Educational Measurement, 42, 245269. doi:10.1111/j.1745-3984.2005.00013.xGoogle Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
Olea, J., Abad, F. J., Ponsoda, V., & Ximénez, M. C. (2004). Un test adaptativo informatizado para evaluar el conocimiento de ingles escrito: Diseño y comprobaciones psicométricas [A computerized adaptive test for the assessment of written English: Design and psychometric properties]. Psicothema, 16, 519525.Google Scholar
Revuelta, J., & Ponsoda, V. (1998). A comparison of item exposure control methods in computerized adaptive testing. Journal of Educational Measurement, 35, 311327. doi:10.1111/j.1745-3984.1998.tb00541.xCrossRefGoogle Scholar
Stocking, M. L., & Lewis, C. L. (2000). Methods of controlling the exposure of items in CAT. In Linden, W. J. van der & Glas, C. A. W. (Eds.) Computerized adaptive testing: Theory and practice (pp. 163182). Dordrecht: Kluwer Academic.Google Scholar
Sympson, J. B., & Hetter, R. D. (1985, October). Controlling item exposure rates in computerized adaptive testing. Proceedings of the 27th annual meeting of the Military Testing Association (pp. 973977). San Diego, CA.Google Scholar
van der Linden, W. J. (1998). Bayesian item selection criteria for adaptive testing. Psychometrika, 63, 201216. doi:10.1007/BF02294775CrossRefGoogle Scholar
van der Linden, W. J. (2003). Some alternatives to Sympson-Hetter item-exposure control in computerized adaptive testing. Journal of Educational and Behavioral Statistics, 28, 249265. doi:10.3102/10769986028003249Google Scholar
van der Linden, W. J., & Glas, C. A. W. (Eds.) (2010). Elements of adaptive testing. New York, NY: Springer.CrossRefGoogle Scholar
van der Linden, W. J., & Veldkamp, B. P. (2004). Constraining item exposure in computerized adaptive testing with shadow tests. Journal of Educational & Behavioral Statistics, 29, 273291. doi:10.3102/10769986029003273CrossRefGoogle Scholar
Veerkamp, W. J. J., & Berger, M. P. F. (1997). Some new ítem selection criteria for adaptive testing. Journal of Educational & Behavioral Statistics, 22, 203226. doi:10.3102/10769986022002203Google Scholar
Way, W. D. (1998). Protecting the integrity of computerized testing ítem pools. Educational Measurement: Issues and Practice, 17, 1727. doi:10.1111/j.1745-3992.1998.tb00632.xGoogle Scholar
Wingersky, M. S., & Lord, F. M. (1984). An investigation of methods for reducing sampling error in certain IRT procedures. Applied Psychological Measurement, 8, 347364. doi:10.1177/014662168400800312CrossRefGoogle Scholar