Hostname: page-component-745bb68f8f-lrblm Total loading time: 0 Render date: 2025-01-07T18:47:05.700Z Has data issue: false hasContentIssue false

Certainty-Based Marking on Multiple-Choice Items: Psychometrics Meets Decision Theory

Published online by Cambridge University Press:  01 January 2025

Qian Wu
Affiliation:
KU Leuven
Monique Vanerum
Affiliation:
University of Hasselt
Anouk Agten
Affiliation:
University of Hasselt
Andrés Christiansen
Affiliation:
KU Leuven
Frank Vandenabeele
Affiliation:
University of Hasselt
Jean-Michel Rigo
Affiliation:
University of Hasselt
Rianne Janssen*
Affiliation:
KU Leuven
*
Correspondence should be made to Rianne Janssen, Center for Educational Effectiveness and Evaluation, KU Leuven, Dekenstraat 2, bus 3773, 3000Leuven, Belgium. Email: [email protected]

Abstract

When a response to a multiple-choice item consists of selecting a single-best answer, it is not possible for examiners to differentiate between a response that is a product of knowledge and one that is largely a product of uncertainty. Certainty-based marking (CBM) is one testing format that requires examinees to express their degree of certainty on the response option they have selected, leading to an item score that depends both on the correctness of an answer and the certainty expressed. The expected score is maximized if examinees truthfully report their level of certainty. However, prospect theory states that people do not always make rational choices of the optimal outcome due to varying risk attitudes. By integrating a psychometric model and a decision-making perspective, the present study looks into the response behaviors of 334 first-year students of physiotherapy on six multiple-choice examinations with CBM in a case study. We used item response theory to model the objective probability of students giving a correct response to an item, and cumulative prospect theory to estimate their risk attitudes when students choose to report their certainty. The results showed that with the given CBM scoring matrix, students’ choices of a certainty level were affected by their risk attitudes. Students were generally risk averse and loss averse when they had a high success probability on an item, leading to an under-reporting of their certainty. Meanwhile, they were risk seeking in case of small success probabilities on the items, resulting in the over-reporting of certainty.

Type
Application Reviews and Case Studies
Copyright
Copyright © 2021 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

The online version contains supplementary material available at https://doi.org/10.1007/s11336-021-09759-0.

References

Avineri, E. (2004). A cumulative prospect theory approach to passengers behavior modeling: Waiting time paradox revisited. Journal of Intelligent Transportation Systems, 8, 195204CrossRefGoogle Scholar
Baldiga, K.(2014). Gender differences in willingness to guess. Management Science, 60, 434448CrossRefGoogle Scholar
Bar-Hillel, M., Budescu, D., Attali, Y.(2005). Scoring and keying multiple choice tests: A case study in irrationality. Mind & Society, 4, 312CrossRefGoogle Scholar
Barr, D. A., Burke, J. R.(2013). Using confidence-based marking in a laboratory setting: A tool for student self-assessment and learning. Journal of Chiropractic Education, 27, 2126CrossRefGoogle Scholar
Benartzi, S., Thaler, R. H.(1995). Myopic loss aversion and the equity premium puzzle. The Quarterly Journal of Economics, 110, 7392CrossRefGoogle Scholar
Ben-Simon, A., Budescu, D. V., Nevo, B.(1997). A comparative study of measures of partial knowledge in multiple-choice tests. Applied Psychological Measurement, 21, 6588CrossRefGoogle Scholar
Bereby-Meyer, Y., Meyer, J., Flascher, O. M.Prospect theory analysis of guessing in multiple choice tests. Journal of Behavioral Decision Making, (2002). 15, 313327CrossRefGoogle Scholar
Birnbaum, A.Lord, F. M., Novick, M. R., Birnbaum, A.(1968). Some latent trait models and their use in inferring an examinee’s ability. Statistical theories of mental test scores, Reading, MA: Addison-Wesley 374472Google Scholar
Boldt, R. F.(1971). A simple confidence testing format. ETS Research Bulletin Series,CrossRefGoogle Scholar
Booij, A. S., van Praag, BMS, van de Kuilen, G.(2010). A parametric analysis of prospect theory’s functionals for the general population. Theory and Decision, 68 (1–2115148CrossRefGoogle Scholar
Brooks, S. P., Gelman, A.(1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7, 434455CrossRefGoogle Scholar
Brown, A.(2016). Item response models for forced-choice questionnaires: A common framework. Psychometrika, 81, 135160CrossRefGoogle ScholarPubMed
Budescu, D., Bar-Hillel, M.(1993). To guess or not to guess: A decision-theoretic view of formula scoring. Journal of Educational Measurement, 30, 277291CrossRefGoogle Scholar
Budescu, D. V., Bo, Y.(2015). Analyzing test-taking behavior: Decision theory meets psychometric theory. Psychometrika, 80, 11051122CrossRefGoogle ScholarPubMed
Chalmers, R. P.(2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48 (6129CrossRefGoogle Scholar
Croson, R., Gneezy, U.(2009). Gender differences in preferences. Journal of Economic Literature, 47, 448474CrossRefGoogle Scholar
Culpepper, S. A., Balamuta, J. J.(2017). A hierarchical model for accuracy and choice on standardized tests. Psychometrika, 82, 820845CrossRefGoogle Scholar
De Ayala, R. J. The theory and practice of item response theory, (2009). New York: Guilford PressGoogle Scholar
Dohmen, T., Falk, A., Huffman, D., Sunde, U., Schupp, J., Wagner, G. G.(2011). Individual risk attitudes: Measurement, determinants and behavioral consequences. Journal of the European Economic Association, 9, 522550CrossRefGoogle Scholar
Drasgow, F., Levine, M. V., Williams, E. A.(1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38, 6786CrossRefGoogle Scholar
Dressel, P. L., Schmid, J.(1953). Some modifications of the multiple-choice item. Educational and Psychological Measurement, 13, 574595CrossRefGoogle Scholar
Frary, R. B.(1988). Formula scoring of multiple-choice tests (correction for guessing). Educational Measurement: Issues and Practice, 7 (23338CrossRefGoogle Scholar
Gardner-Medwin, A. R.Bryan, C., Clegg, K.(2006). Confidence-based marking: Towards deeper learning and better exams. Innovative assessment in higher education, London: Routledge 141159Google Scholar
Gelman, A., Rubin, D. B.(1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457472CrossRefGoogle Scholar
Gilles, J.-L. (2002). Qualité spectrales des tests standardisés universitaires—Mise au point d’indices édumétriques d’analyse de la qualité spectrale des évaluations des acquis des étudiants universitaires et application aux épreuves MOHICAN check up ’99 Spectral [Quality of University Standardized Tests—Development of edumetrical indices for the analysis of the spectral quality of higher education standardized tests and application to the MOHICAN check up ’99 tests] (Doctoral thesis, University of Liège, Belgium). Retrieved from http://hdl.handle.net/2268/2217Google Scholar
Gonzalez, R., Wu, G.(1999). On the shape of the probability weighting function. Cognitive Psychology, 38, 129166CrossRefGoogle ScholarPubMed
Hassmen, P., Hunt, D. P.(1994). Human self-assessment in multiple-choice testing. Journal of Educational Measurement, 31, 149160CrossRefGoogle Scholar
Kahneman, D., Tversky, A.(1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 263292CrossRefGoogle Scholar
Köbberling, V., Wakker, P. P.(2005). An index of loss aversion. Journal of Economic Theory, 122, 119131CrossRefGoogle Scholar
Leclercq, D.(1983). Confidence marking: Its use in testing. Evaluation in Education, 6, 161287CrossRefGoogle Scholar
Lesage, E., Valcke, M., Sabbe, E.(2013). Scoring methods for multiple choice assessment in higher educationals—It still a matter of number right scoring or negative marking?. Studies in Educational Evaluation, 39, 188193CrossRefGoogle Scholar
Lichtenstein, S., Fischhoff, B., Phillips, L. D.Kahneman, D., Slovic, P., Tversky, A.(1982). Calibration of probabilities: The state of the art to 1980. Judgment under uncertainty, Cambridge: Cambridge University Press 306334CrossRefGoogle Scholar
Lindquist, E. F., Hoover, H. D.(2015). Some notes on corrections for guessing and related problems. Educational Measurement: Issues and Practice, 34 (21519CrossRefGoogle Scholar
Luce, R. D. (2005). Individual choice behavior: A theoretical analysis, New York, NY: WileyCrossRefGoogle Scholar
Lunn, D. J., Thomas, A., Best, N., Spiegelhalter, D.(2000). WinBUGS—A Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing, 10, 325337CrossRefGoogle Scholar
McFadden, D.(1980). Econometric models for probabilistic choice among products. The Journal of Business, 53 (3S13S19CrossRefGoogle Scholar
Murphy, R. O., ten Brincke, RHW(2018). Hierarchical maximum likelihood parameter estimation for cumulative prospect theory: Improving the reliability of individual risk parameter estimates. Management Science, 64, 308326CrossRefGoogle Scholar
Nilsson, H., Rieskamp, J., Wagenmakers, E-J(2011). Hierarchical Bayesian parameter estimation for cumulative prospect theory. Journal of Mathematical Psychology, 55, 8493CrossRefGoogle Scholar
Orlando, M., Thissen, D.(2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24, 5064CrossRefGoogle Scholar
Pekkarinen, T.(2015). Gender differences in behaviour under competitive pressure: Evidence on omission patterns in university entrance examinations. Journal of Economic Behavior & Organization, 115, 94110CrossRefGoogle Scholar
Prelec, D.(1998). The probability weighting function. Econometrica, 66, 497527CrossRefGoogle Scholar
R Core Team (2019). R: A language and environment for statistical computing, Vienna: R Foundation for Statistical ComputingGoogle Scholar
Rouder, J. N., Province, J. M., Morey, R. D., Gomez, P., Heathcote, A.(2015). The lognormal race: A cognitive-process model of choice and latency with desirable psychometric properties. Psychometrika, 80, 491513CrossRefGoogle Scholar
Schoendorfer, N., Emmett, D.(2012). Use of certainty-based marking in a second-year medical student cohort: A pilot study. Advances in Medical Education and Practice,CrossRefGoogle Scholar
Simon, H. A. (1957). Models of man: Social and rational- mathematical essays on rational human behavior in a social setting, New York, NY: WileyGoogle Scholar
Stott, H. P.(2006). Cumulative prospect theory’s functional menagerie. Journal of Risk and Uncertainty, 32, 101130CrossRefGoogle Scholar
Sturtz, S., Ligges, U., Gelman, A.(2005). R2WinBUGS: A package for running WinBUGS from R. Journal of Statistical Software,CrossRefGoogle Scholar
Sydnor, J.(2010). (Over)insuring modest risks. American Economic Journal: Applied Economics, 2, 177199Google Scholar
Thaler, R.(1980). Toward a positive theory of consumer choice. Journal of Economic Behavior & Organization, 1, 3960CrossRefGoogle Scholar
Train, K. E.(2009). Discrete choice methods with simulation, Cambridge: Cambridge University PressGoogle Scholar
Tversky, A., Kahneman, D.(1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5, 297323CrossRefGoogle Scholar
von Davier, M.(2009). Is there need for the 3PL model? Guess what?. Measurement: Interdisciplinary Research & Perspective, 7, 110114Google Scholar
Wright, B. D.(1995). 3PL or Rasch?. Rasch Measurement Transactions, 9 (1408Google Scholar
Wu, Q., De Laet, T., Janssen, R.(2018). Elimination scoring versus correction for guessing: A simulation study. Quantitative psychology. IMPS 2017. Springer proceedings in mathematics & statistics, Cham, Switzerland: Springer 183193Google Scholar
Supplementary material: File

Wu et al. supplementary material

Wu et al. supplementary material 1
Download Wu  et al. supplementary material(File)
File 196.2 KB
Supplementary material: File

Wu et al. supplementary material

Wu et al. supplementary material 2
Download Wu  et al. supplementary material(File)
File 200.9 KB
Supplementary material: File

Wu et al. supplementary material

Wu et al. supplementary material 3
Download Wu  et al. supplementary material(File)
File 99.3 KB
Supplementary material: File

Wu et al. supplementary material

Wu et al. supplementary material 4
Download Wu  et al. supplementary material(File)
File 187.7 KB
Supplementary material: File

Wu et al. supplementary material

Wu et al. supplementary material 5
Download Wu  et al. supplementary material(File)
File 117.8 KB
Supplementary material: File

Wu et al. supplementary material

Wu et al. supplementary material 6
Download Wu  et al. supplementary material(File)
File 159.9 KB
Supplementary material: File

Wu et al. supplementary material

Wu et al. supplementary material 7
Download Wu  et al. supplementary material(File)
File 17.6 KB