Hostname: page-component-745bb68f8f-l4dxg Total loading time: 0 Render date: 2025-01-07T18:53:48.276Z Has data issue: false hasContentIssue false

Analyses of Model Fit and Robustness. A New Look at the PISA Scaling Model Underlying Ranking of Countries According to Reading Literacy

Published online by Cambridge University Press:  01 January 2025

Svend Kreiner*
Affiliation:
University of Copenhagen
Karl Bang Christensen
Affiliation:
University of Copenhagen
*
Requests for reprints should be sent to Svend Kreiner, Department of Biostatistics, University of Copenhagen, Oster Farimagsgade 5, B, PO Box 2029, 1014 Copenhagen K, Denmark. E-mail: [email protected]

Abstract

This paper addresses methodological issues that concern the scaling model used in the international comparison of student attainment in the Programme for International Student Attainment (PISA), specifically with reference to whether PISA’s ranking of countries is confounded by model misfit and differential item functioning (DIF). To determine this, we reanalyzed the publicly accessible data on reading skills from the 2006 PISA survey. We also examined whether the ranking of countries is robust in relation to the errors of the scaling model. This was done by studying invariance across subscales, and by comparing ranks based on the scaling model and ranks based on models where some of the flaws of PISA’s scaling model are taken into account. Our analyses provide strong evidence of misfit of the PISA scaling model and very strong evidence of DIF. These findings do not support the claims that the country rankings reported by PISA are robust.

Type
Original Paper
Copyright
Copyright © 2013 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Adams, R.J. (2003). Response to ‘Cautions on OECD’s recent educational survey (PISA)’. Oxford Review of Education, 29, 379389 Note: Publications from PISA can be found at http://www.oecd.org/pisa/pisaproducts/CrossRefGoogle Scholar
Adams, R., Berezner, A., Jakubowski, M. (2010). Analysis of PISA 2006 preferred items ranking using the percent-correct method, Paris: OECD http://www.oecd.org/pisa/pisaproducts/pisa2006/44919855.pdfGoogle Scholar
Adams, R.J., Wilson, M., Wang, W. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 123CrossRefGoogle Scholar
Adams, R.J., Wu, M.L., Carstensen, C.H. (2007). Application of multivariate Rasch models in international large-scale educational assessments. In Von Davier, M., Carstensen, C.H. (Eds.), Multivariate and mixture distribution Rasch models, New York: Springer 271280CrossRefGoogle Scholar
Andersen, E.B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38, 123140CrossRefGoogle Scholar
Brown, G., Micklewrigth, J., Schnepf, S.V., Waldmann, R. (2007). International surveys of educational achievement: how robust are the findings?. Journal of the Royal Statistical Society. Series A. General, 170, 623646CrossRefGoogle Scholar
Dorans, N.J., Holland, P.W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In Holland, P.W., Wainer, H. (Eds.), Differential item functioning, Hilsdale: Lawrence Erlbaum Associates 3566Google Scholar
Fischer, G.H., Molenaar, I.W. (1995). Rasch models—foundations, recent developments, and applications, Berlin: SpringerGoogle Scholar
Glass, G.V., Hopkins, K.D. (1995). Statistical methods in education and psychology, Boston: Allyn & BaconGoogle Scholar
Goldstein, H. (2004). International comparisons of student attainment: some issues arising from the PISA study. Assessment in Education, 11, 319330Google Scholar
Goodman, L.A., Kruskal, W.H. (1954). Measures of association for cross classifications. Journal of the American Statistical Association, 49, 732764Google Scholar
Hopmann, S.T., Brinek, G., Retzl, M. (2007). PISA zufolge PISA. PISA according to PISA, Wien: Lit Verlag http://www.univie.ac.at/pisaaccordingtopisa/pisazufolgepisa.pdfGoogle Scholar
Kelderman, H. (1984). Loglinear Rasch model tests. Psychometrika, 49, 223245CrossRefGoogle Scholar
Kelderman, H. (1989). Item bias detection using loglinear IRT. Psychometrika, 54, 681697CrossRefGoogle Scholar
Kirsch, I., de Jng, J., Lafontaine, D., McQueen, J., Mendelovits, J., Monseur, C. (2002). Reading for change. performance and engagement across countries. results from PISA 2000, Paris: OECDGoogle Scholar
Kreiner, S. (1987). Analysis of multidimensional contingency tables by exact conditional tests: techniques and strategies. Scandinavian Journal of Theoretical Statistics, 14, 97112Google Scholar
Kreiner, S. (2011). A note on item-restscore association in Rasch models. Applied Psychological Measurement, 35, 557561CrossRefGoogle Scholar
Kreiner, S. (2011b). Is the foundation under PISA solid? A critical look at the scaling model underlying international comparisons of student attainment. Research report 11/1, Dept. of Biostatistics, University of Copenhagen. https://ifsv.sund.ku.dk/biostat/biostat_annualreport/images/c/ca/ResearchReport-2011-1.pdf. Google Scholar
Kreiner, S., Christensen, K.B. (2007). Validity and objectivity in health-related scales: analysis by graphical loglinear Rasch models. In Von Davier, M., Carstensen, C.H. (Eds.), Multivariate and mixture distribution Rasch models, New York: Springer 271280Google Scholar
Kreiner, S., Christensen, K.B. (2011). Exact evaluation of bias in Rasch model residuals. Advances in Mathematics Research, 12, 1940Google Scholar
Molenaar, I.V. (1983). Some improved diagnostics for failure of the Rasch model. Psychometrika, 48, 4972CrossRefGoogle Scholar
OECD (2000). Measuring student knowledge and skills. the PISA 2000 assessment of reading, mathematical and scientific literacy, Paris: OECD http://www.oecd.org/dataoecd/44/63/33692793.pdfGoogle Scholar
OECD (2006). PISA 2006. Technical report. Paris: OECD. http://www.oecd.org/dataoecd/0/47/42025182.pdf. Google Scholar
OECD (2007). PISA 2006. Volume 2: data, Paris: OECDGoogle Scholar
OECD (2009). PISA data analysis manual: SPSS, (2nd ed.). Paris: OECD http://www.oecd-ilibrary.org/education/pisa-data-analysis-manual-spss-second-edition_9789264056275-enGoogle Scholar
Prais, S.J. (2003). Cautions on OECD’s recent educational survey (PISA). Oxford Review of Education, 29, 139163CrossRefGoogle Scholar
Rosenbaum, P. (1989). Criterion-related construct validity. Psychometrika, 54, 625633CrossRefGoogle Scholar
Smith, R.M. (2004). Fit analysis in latent trait measurement models. In Smith, E.V., Smith, R.M. (Eds.), Introduction to Rasch measurement, Maple Grove: JAM Press 7392Google Scholar
Schmitt, A.P., & Dorans, N.J. (1987). Differential item functioning on the scholastic aptitude test. Research memorandum No. 87-1. Princeton NJ: Educational Testing Service. Google Scholar