Hostname: page-component-745bb68f8f-hvd4g Total loading time: 0 Render date: 2025-01-08T03:31:00.851Z Has data issue: false hasContentIssue false

Second-Order Probability Matching Priors for the Person Parameter in Unidimensional IRT Models

Published online by Cambridge University Press:  01 January 2025

Yang Liu*
Affiliation:
University of Maryland
Jan Hannig
Affiliation:
The University of North Carolina at Chapel Hill
Abhishek Pal Majumder
Affiliation:
Stockholm University
*
Correspondence should be made to Yang Liu, Department of Human Development and Quantitative Methodology,University of Maryland, College Park, USA. Email: [email protected]

Abstract

In applications of item response theory (IRT), it is often of interest to compute confidence intervals (CIs) for person parameters with prescribed frequentist coverage. The ubiquitous use of short tests in social science research and practices calls for a refinement of standard interval estimation procedures based on asymptotic normality, such as the Wald and Bayesian CIs, which only maintain desirable coverage when the test is sufficiently long. In the current paper, we propose a simple construction of second-order probability matching priors for the person parameter in unidimensional IRT models, which in turn yields CIs with accurate coverage even when the test is composed of a few items. The probability matching property is established based on an expansion of the posterior distribution function and a shrinkage argument. CIs based on the proposed prior can be efficiently computed for a variety of unidimensional IRT models. A real data example with a mixed-format test and a simulation study are presented to compare the proposed method against several existing asymptotic CIs.

Type
Original Paper
Copyright
Copyright © 2019 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11336-019-09675-4) contains supplementary material, which is available to authorized users.

References

Barndorff-Nielsen, O., & Cox, D. R. (1979). Edgeworth and saddle-point approximations with statistical applications. Journal of the Royal Statistical Society. Series B (Methodological), 41, 279312.CrossRefGoogle Scholar
Bhattacharya, R. N., & Ghosh, J. K. (1978). On the validity of the formal Edgeworth expansion. The Annals of Statistics, 6 (2), 434451.CrossRefGoogle Scholar
Bickel, P. J.Doksum, K. A. (2015). Mathematical statistics: Basic ideas and selected topics, 2 Boca Raton: CRC Press.Google Scholar
Biehler, M.Holling, H., & Doebler, P. (2015). Saddlepoint approximations of the distribution of the person parameter in the two parameter logistic model. Psychometrika, 80 (3), 665688.CrossRefGoogle ScholarPubMed
Birnbaum, A.Lord, F. M., & Novick, M. R. (1968). Some latent train models and their use in inferring an examinee’s ability. Statistical theories of mental test scores, Reading, MA: Addison-Wesley. 395479.Google Scholar
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37 (1), 2951.CrossRefGoogle Scholar
Brazzale, A. R., & Davison, A. C. (2008). Accurate parametric inference for small samples. Statistical Science, 23 (4), 465484.CrossRefGoogle Scholar
Brent, R. P. (1973). Some efficient algorithms for solving systems of nonlinear equations. SIAM Journal on Numerical Analysis, 10 (2), 327344.CrossRefGoogle Scholar
Briggs, D. C., & Weeks, J. P. (2009). The impact of vertical scaling decisions on growth interpretations. Educational Measurement: Issues and Practice, 28 (4), 314.CrossRefGoogle Scholar
Brown, L. D.Cai, T. T., & DasGupta, A. (2001). Interval estimation for a binomial proportion. Statistical Science, 16, 101117.CrossRefGoogle Scholar
Brown, L. D.Cai, T. T., & DasGupta, A. (2002). Confidence intervals for a binomial proportion and asymptotic expansions. The Annals of Statistics, 30 (1), 160201.CrossRefGoogle Scholar
Cai, T. T. (2005). One-sided confidence intervals in discrete distributions. Journal of Statistical Planning and Inference, 131 (1), 6388.Google Scholar
Chang, H-H (1996). The asymptotic posterior normality of the latent trait for polytomous IRT models. Psychometrika, 61 (3), 445463. https://doi.org/10.1007/BF02294549.CrossRefGoogle Scholar
Chang, H-H, & Stout, W. (1993). The asymptotic posterior normality of the latent trait in an IRT model. Psychometrika, 58 (1), 3752.CrossRefGoogle Scholar
Cheng, Y., & Yuan, K-H (2010). The impact of fallible item parameter estimates on latent trait recovery. Psychometrika, 75 (2), 280291.CrossRefGoogle ScholarPubMed
Daniels, H. E. (1954). Saddlepoint approximations in statistics. The Annals of Mathematical Statistics, 25, 631650.CrossRefGoogle Scholar
Datta, G., & Mukerjee, R. (2004). Probability matching priors: Higher order asymptotics, New York: Springer.CrossRefGoogle Scholar
de la Torre, J., & Deng, W. (2008). Improving person-fit assessment by correcting the ability estimate and its reference distribution. Journal of Educational Measurement, 45 (2), 159177.CrossRefGoogle Scholar
Deutskens, E.De Ruyter, K.Wetzels, M., & Oosterveld, P. (2004). Response rate and response quality of internet-based surveys: An experimental study. Marketing Letters, 15 (1), 2136.CrossRefGoogle Scholar
Doebler, A.Doebler, P., & Holling, H. (2013). Optimal and most exact confidence intervals for person parameters in item response theory models. Psychometrika, 78 (1), 98115.CrossRefGoogle ScholarPubMed
Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80 (1), 2738.CrossRefGoogle Scholar
Flake, J. K.Pek, J., & Hehman, E. (2017). Construct validation in social and personality research: Current practice and recommendations. Social Psychological and Personality Science, 8 (4), 370378.CrossRefGoogle Scholar
Fritsch, F. N., & Carlson, R. E. (1980). Monotone piecewise cubic interpolation. SIAM Journal on Numerical Analysis, 17 (2), 238246.CrossRefGoogle Scholar
Ghosh, J. K., & Mukerjee, R. (1993). On priors that match posterior and frequentist distribution functions. Canadian Journal of Statistics, 21 (1), 8996.CrossRefGoogle Scholar
Ghosh, J. K.Ramamoorthi, R. V. (2006). Bayesian nonparametrics, New York: Springer.Google Scholar
Ghosh, M. (2011). Objective priors: An introduction for frequentists. Statistical Science, 26, 187202.CrossRefGoogle Scholar
Glas, C. A., & Meijer, R. R. (2003). A Bayesian approach to person fit analysis in item response theory models. Applied Psychological Measurement, 27 (3), 217233.CrossRefGoogle Scholar
Ibragimov, I.Has’minskii, R. (1981). Statistical estimation: Asymptotic theory, New York: Springer.CrossRefGoogle Scholar
Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 186, 453461.Google ScholarPubMed
Klauer, K. C. (1991). An exact and optimal standardized person test for assessing consistency with the Rasch model. Psychometrika, 56 (2), 213228.CrossRefGoogle Scholar
Liu, X.Han, Z., & Johnson, M. S. (2017). The UMP exact test and the confidence interval for person parameters in IRT models. Psychometrika, 83, 182202.CrossRefGoogle ScholarPubMed
Liu, Y., & Yang, J. S. (2017a). Bootstrap-calibrated interval estimates for latent variable scores in item response theory. Psychometrika. https://doi.org/10.1007/s11336-017-9582-9.CrossRefGoogle Scholar
Liu, Y., & Yang, J. S. (2017b). Interval estimation of latent variable scores in item response theory. Journal of Educational and Behavioral Statistics. https://doi.org/10.3102/1076998617732764.CrossRefGoogle Scholar
Lord, F. M. (1952). A theory of test scores, New York: Psychometric Society.Google Scholar
Lugannani, R., & Rice, S. (1980). Saddle point approximation for the distribution of the sum of independent random variables. Advances in Applied Probability, 12 (2), 475490.CrossRefGoogle Scholar
Magis, D. (2015). A note on the equivalence between observed and expected information functions with polytomous IRT models. Journal of Educational and Behavioral Statistics, 40 (1), 96105.CrossRefGoogle Scholar
Magis, D. (2015). A note on weighted likelihood and Jeffreys modal estimation of proficiency levels in polytomous item response models. Psychometrika, 80 (1), 200204.CrossRefGoogle ScholarPubMed
Magis, D., & Raîche, G. (2012). On the relationships between Jeffreys modal and weighted likelihood estimation of ability under logistic IRT models. Psychometrika, 77 (1), 163169.CrossRefGoogle Scholar
McDonald, R. P. (1981). The dimensionality of tests and items. British Journal of Mathematical and Statistical Psychology, 34 (1), 100117.CrossRefGoogle Scholar
Mukerjee, R. (2008). Data-dependent probability matching priors for empirical and related likelihoods. In B. Clarke and S. Ghosal (Eds.), Pushing the limits of contemporary statistics: Contributions in honor of Jayanta K. Ghosh (pp. 60–70). Beachwood, Ohio: Institute of Mathematical Statistics.Google Scholar
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16 (2), 159176.CrossRefGoogle Scholar
Ogasawara, H. (2012). Asymptotic expansions for the ability estimator in item response theory. Computational Statistics, 27 (4), 661683.CrossRefGoogle Scholar
Ong, S., & Mukerjee, R. (2010). Data-dependent probability matching priors of the second order. Statistics, 44 (3), 291302.CrossRefGoogle Scholar
R Core Team. (2018). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from https://www.R-project.org/.Google Scholar
Rasch, G. (1961). On general laws and the meaning of measurement in psychology. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability (Vol. 4, pp. 321–333).Google Scholar
Reid, N. (1988). Saddlepoint methods and statistical inference. Statistical Science, 3, 213227.Google Scholar
Rupp, A. A. (2013). A systematic review of the methodology for person fit research in item response theory: Lessons about generalizability of inferences from the design of simulation studies. Psychological Test and Assessment Modeling, 55 (1), 338.Google Scholar
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph No. 17, Richmond, VA: Psychometric Society.Google Scholar
Sinharay, S. (2015). Assessment of person fit for mixed-format tests. Journal of Educational and Behavioral Statistics, 40 (4), 343365.CrossRefGoogle ScholarPubMed
Thissen, D.Steinberg, L.Millsap, R., & Maydeu-Olivares, A. (2009). Item response theory. The SAGE handbook of quantitative methods in psychology, London: Sage Publications. 148177.CrossRefGoogle Scholar
Thissen, D.Wainer, H. (2001). Test scoring, New York: Taylor & Francis.CrossRefGoogle Scholar
van der Linden, W. (2016). Handbook of item response theory, volume one: Models, Boca Raton: CRC Press.CrossRefGoogle Scholar
van der Linden, W., & Glas, C. (2007). Computerized adaptive testing: Theory and practice, Dordrecht: Springer.Google Scholar
Wang, X.Liu, Y., & Hambleton, R. K. (2017). Detecting item preknowledge using a predictive checking method. Applied Psychological Measurement, 41 (4), 243263.CrossRefGoogle ScholarPubMed
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54 (3), 427450.CrossRefGoogle Scholar
Wasserman, L. (2000). Asymptotic inference for mixture models by using data-dependent priors. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62 (1), 159180.CrossRefGoogle Scholar
Weeks, J. P. (2010). plink: An R package for linking mixed-format tests using IRT-based methods. Journal of Statistical Software, 35 (12), 133.CrossRefGoogle Scholar
Welch, B., & Peers, H. (1963). On formulae for confidence points based on integrals of weighted likelihoods. Journal of the Royal Statistical Society. Series B (Methodological), 25 (2), 318329.CrossRefGoogle Scholar
Yang, J. S.Hansen, M., & Cai, L. (2012). Characterizing sources of uncertainty in item response theory scale scores. Educational and Psychological Measurement, 72 (2), 264290.CrossRefGoogle Scholar
Supplementary material: File

Liu et al. supplementary material

Liu et al. supplementary material 1
Download Liu et al. supplementary material(File)
File 251.7 KB
Supplementary material: File

Liu et al. supplementary material

Liu et al. supplementary material 2
Download Liu et al. supplementary material(File)
File 11.2 KB
Supplementary material: File

Liu et al. supplementary material

Liu et al. supplementary material 3
Download Liu et al. supplementary material(File)
File 18 KB
Supplementary material: File

Liu et al. supplementary material

Liu et al. supplementary material 4
Download Liu et al. supplementary material(File)
File 2.9 KB