Hostname: page-component-5f745c7db-2kk5n Total loading time: 0 Render date: 2025-01-06T06:19:28.620Z Has data issue: true hasContentIssue false

Identifying and Supporting Academically Low-Performing Schools in a Developing Country: An Application of a Specialized Multilevel IRT Model to PISA-D Assessment Data

Published online by Cambridge University Press:  01 January 2025

Meredith Langi*
Affiliation:
NWEA
Minjeong Jeon
Affiliation:
University of California
*
Correspondence should be made to Meredith Langi, NWEA, Portland, USA. Email: [email protected]

Abstract

Performance-targeted interventions are an important tool in improving educational outcomes and are often applied at the school level, where low-performing schools are selected for participation. In this paper, we aim to identify low-performing schools in Cambodia that are in need of support on improving students’ abilities in formulating math problems. Using data from the PISA for Development project, we present an application of a structured multilevel mixture item response theory (IRT) model that utilizes strategic constraints in order to achieve our research aims. The approach utilized in this application draws on psychometric traditions in multilevel IRT modeling, mixture IRT modeling, and constrained mixture IRT modeling. Results support classifications of Cambodian schools participating in PISA-D as low- and non-low-performing schools, as well as provide insight into these schools various contexts. Implications for future school interventions in Cambodia as well as future extensions to this modeling approach are discussed.

Type
Application Reviews and Case Studies
Copyright
Copyright © 2022 The Author(s) under exclusive licence to The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Akaike, H., (1973). Maximum likelihood identification of Gaussian autoregressive moving average models Biometrika 60(2) 255265 10.1093/biomet/60.2.255CrossRefGoogle Scholar
Alotaibi, A., Khalil, I., Wardat, Y., (2021). Teaching practices of the mathematics male and female teachers according to the PISA framework and its relation to their beliefs towards their students Online Submission 20(1) 12471265Google Scholar
Asparouhov, T., Muthen, B., Hancock, G., Samuelsen, K., (2008). Multilevel mixture models Advances in latent variable mixture models Information Age Publishing Inc 2751Google Scholar
Bolsinova, M., de Boeck, P., Tijmstra, J., (2017). Modelling conditional dependence between response time and accuracy Psychometrika 82(4) 11261148 27738955 10.1007/s11336-016-9537-6CrossRefGoogle ScholarPubMed
Boughton, K. A., & Yamamoto, K. (2007). A hybrid model for test speededness. In Multivariate and mixture distribution Rasch models (pp. 147–156). Springer.CrossRefGoogle Scholar
Brown, R. S., (2007). Using latent class analysis to set academic performance standards Educational Assessment 12 3–4 283301Google Scholar
Caro, D., & Kyriakides, L. (2019). Assessment design and quality of inferences in PISA: Limitations and recommendations for improvement (Vol. 26) (No. 4). Taylor & Francis.CrossRefGoogle Scholar
Chiquet, J. (2020). Package ‘aricode’. R package version.Google Scholar
Cho, S-J Cohen, A. S., (2010). A multilevel mixture IRT model with an application to DIF Journal of Educational and Behavioral Statistics 35(3) 336370 10.3102/1076998609353111CrossRefGoogle Scholar
Clauser, B. E., Swanson, D. B., Harik, P., (2002). Multivariate generalizability analysis of the impact of training and examinee performance information on judgments made in an Angoff-style standard-setting procedure Journal of Educational Measurement 39(4) 269290 10.1111/j.1745-3984.2002.tb01143.xCrossRefGoogle Scholar
Cohen, J., (1960). A coefficient of agreement for nominal scales Educational and Psychological Measurement 20(1) 3746 10.1177/001316446002000104CrossRefGoogle Scholar
Commons, M. L., Pekker, A., (2005). Hierarchical complexity: A formal theory Journal of Mathematical Psychology 52(1) 109109Google Scholar
Darling-Hammond, L. (2010). Performance counts: Assessment systems that support high-quality learning. Washington, DC: Council of Chief State School Officers. Washington, DC:Google Scholar
Dawson-Tunik, T. L., Goodheart, E. A., Draney, K., Wilson, M., Commons, M. L., (2010). Concrete, abstract, formal, and systematic operations as observed in a “Piagetian” balance-beam task series Journal of Applied Measurement 11(1) 1123Google Scholar
Desimone, L. M., Le Floch, K. C., (2004). Are we asking the right questions? Using cognitive interviews to improve surveys in education research Educational Evaluation and Policy Analysis 26(1) 122 10.3102/01623737026001001CrossRefGoogle Scholar
Dias, J. G., Vermunt, J. K., (2008). A bootstrap-based aggregate classifier for model-based clustering Computational Statistics 23(4) 643659 10.1007/s00180-007-0103-7CrossRefGoogle Scholar
Draney, K., Jeon, M., (2011). Investigating the Saltus model as a tool for setting standards Psychological Test and Assessment Modeling 53(4) 486Google Scholar
Draney, K., & Wilson, M. (2007). Application of the Saltus model to stagelike data: Some applications and current developments. In Multivariate and mixture distribution Rasch models (pp. 119–130). Springer.CrossRefGoogle Scholar
Edo, S. I., Putri, R. I. I., Hartono, Y., (2013). Investigating secondary school students’ difficulties in modeling problems PISA-model level 5 and 6 Journal on Mathematics Education 4(1) 4158 10.22342/jme.4.1.561.41-58CrossRefGoogle Scholar
Fox, J-P (2004). Applications of multilevel IRT modeling School Effectiveness and School Improvement 15 3–4 261280 10.1080/09243450512331383212CrossRefGoogle Scholar
Fox, J-P (2005). Multilevel IRT using dichotomous and polytomous response data British Journal of Mathematical and Statistical Psychology 58(1) 145172 15969844 10.1348/000711005X38951CrossRefGoogle ScholarPubMed
Fox, J-P Glas, C. A., (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling Psychometrika 66(2) 271288 10.1007/BF02294839CrossRefGoogle Scholar
Geisinger, K. F., McCormick, C. M., (2010). Adopting cut scores: Post-standard-setting panel considerations for decision makers Educational Measurement: Issues and Practice 29(1) 3844 10.1111/j.1745-3992.2009.00168.xCrossRefGoogle Scholar
Gnaldi, M., Bacci, S., Bartolucci, F., (2016). A multilevel finite mixture item response model to cluster examinees and schools Advances in Data Analysis and Classification 10(1) 5370 10.1007/s11634-014-0196-0CrossRefGoogle Scholar
Henson, J. M., Reise, S. P., Kim, K. H., (2007). Detecting mixtures from structural model differences using latent variable mixture modeling: A comparison of relative model fit statistics Structural Equation Modeling: A Multidisciplinary Journal 14(2) 202226 10.1080/10705510709336744CrossRefGoogle Scholar
Hill, P. W., Rowe, K. J., (1996). Multilevel modelling in school effectiveness research School effectiveness and school improvement 7(1) 134 10.1080/0924345960070101CrossRefGoogle Scholar
Ho, E. S. C., Multilevel analysis of the PISA data: insights for policy and practice The Chinese University of Hong Kong Press 10.2307/j.ctt1p9wqq7CrossRefGoogle Scholar
Hubert, L., Arabie, P., (2013). Comparing partitions Journal of classification (1985). 2(1) 193218 10.1007/BF01908075CrossRefGoogle Scholar
Jeon, M., (2018). A constrained confirmatory mixture IRT model: Extensions and estimation of the Saltus model using Mplus The Quantitative Methods for Psychology 14(2) 120136 10.20982/tqmp.14.2.p120CrossRefGoogle Scholar
Jeon, M., De Boeck, P., Li, X., Lu, Z-L (2020). Trivariate theory of mind data analysis with a conditional joint modeling approach Psychometrika 85(2) 398436 32623558 10.1007/s11336-020-09710-9CrossRefGoogle ScholarPubMed
Jeon, M., De Boeck, P., van der Linden, W., (2017). Modeling answer change behavior: An application of a generalized item response tree model Journal of Educational and Behavioral Statistics 42(4) 467490 10.3102/1076998616688015CrossRefGoogle Scholar
Jeon, M., Draney, K., & Wilson, M. (2015). A general saltus LLTM-R for cognitive assessments. In Quantitative psychology research (pp. 73–90). Springer.CrossRefGoogle Scholar
Jeon, M., Draney, K., Wilson, M., Sun, Y., (2020). Investigation of adolescents’ developmental stages in deductive reasoning: An application of a specialized confirmatory mixture IRT approach Behavior Research Methods 52(1) 224235 30895455 10.3758/s13428-019-01221-5CrossRefGoogle ScholarPubMed
Jeon, M., Jin, I. H., Schweinberger, M., Baugh, S., (2021). Mapping unobserved item-respondent interactions: A latent space item response model with interaction map Psychometrika 86(2) 378403 33939062 10.1007/s11336-021-09762-5CrossRefGoogle ScholarPubMed
Jia, B., Zhu, Z., & Gao, H. (2021). International comparative study of statistics learning trajectories based on PISA data on cognitive diagnostic models. Frontiers in Psychology, 12. https://doi.org/10.3389/fpsyg.2021.657858CrossRefGoogle Scholar
Jiao, H., Lissitz, R. W., Macready, G., Wang, S., Liang, S., (2011). Exploring levels of performance using the mixture Rasch model for standard setting1 Psychological Test and Assessment Modeling 53(4) 499Google Scholar
Kamata, A., & Vaughn, B. K. (2011). Multilevel IRT modeling. In Handbook of advanced multilevel analysis (pp. 49–66). Routledge.Google Scholar
Landis, J. R., Koch, G. G., (1977). The measurement of observer agreement for categorical data Biometrics 33(1) 159174 843571 10.2307/2529310CrossRefGoogle ScholarPubMed
Luecht, R., & DeChamplain, A. (1998). Applications of latent class analysis to mastery decisions using complex performance assessments. In Annual meeting of the American Educational Research Association.Google Scholar
Martínez-Abad, F., Gamazo, A., Rodríguez-Conde, M-J (2020). Educational data mining: Identification of factors associated with school effectiveness in PISA assessment Studies in Educational Evaluation 66 100875 10.1016/j.stueduc.2020.100875CrossRefGoogle Scholar
Massell, D. (2000). The district role in building capacity: Four strategies. CRPE policy briefs.Google Scholar
McNamara, V., & Hayden, M. (2022). Education in Cambodia: From year zero towards international standards. In McNamara, V. & Hayden, M. (Eds.), (pp. 1–10). Springer.CrossRefGoogle Scholar
Meyer, D., Zeileis, A., Hornik, K., Gerber, F., Friendly, M., & Meyer, M. D. (2020). Package ‘vcd’. R package version.Google Scholar
Mislevy, R. J., (1984). Estimating latent distributions Psychometrika 49(3) 359381 10.1007/BF02306026CrossRefGoogle Scholar
Mislevy, R. J., Verhelst, N., (1990). Modeling item responses when different subjects employ different solution strategies Psychometrika 55(2) 195215 10.1007/BF02295283CrossRefGoogle Scholar
Mislevy, R. J., Wilson, M., (1996). Marginal maximum likelihood estimation for a psychometric model of discontinuous development Psychometrika 61(1) 4171 10.1007/BF02296958CrossRefGoogle Scholar
MoEYS. (2018). Education in Cambodia: Findings from Cambodia’s experience in PISA for Development. Phnom Penh: Author.Google Scholar
Muthén, B. O., (1989). Latent variable modeling in heterogeneous populations Psychometrika 54(4) 557585 10.1007/BF02296397CrossRefGoogle Scholar
Muthén, B. O., & Muthén, L. (2010). Technical appendices. Authors.Google Scholar
Muthén, L., & Muthén, B. (2019). Mplus. The comprehensive modelling program for applied researchers: User’s guide, p. 5Google Scholar
OECD. (2018). PISA for Development assessment and analytic framework: Reading, mathematics and science. Paris: OECD Publishing.CrossRefGoogle Scholar
OECD. (2019). PISA for Development technical report (Tech. Rep.). Organization for Economic Co-operation and Development.Google Scholar
Palardy, G. J., (2008). Differential school effects among low, middle, and high social class composition schools: A multiple group, multilevel latent growth curve analysis School Effectiveness and School Improvement 19(1) 2149 10.1080/09243450801936845CrossRefGoogle Scholar
Pastor, D. A., (2003). The use of multilevel item response theory modeling in applied research: An illustration Applied Measurement in Education 16(3) 223243 10.1207/S15324818AME1603_4CrossRefGoogle Scholar
Reynolds, D., Sammons, P., De Fraine, B., Van Damme, J., Townsend, T., Teddlie, C., Stringfield, S., (2014). Educational effectiveness research (EER): A state-of-the-art review School effectiveness and school improvement 25(2) 197230 10.1080/09243453.2014.885450CrossRefGoogle Scholar
Rost, J., (1990). Rasch models in latent classes: An integration of two approaches to item analysis Applied Psychological Measurement 14(3) 271282 10.1177/014662169001400305CrossRefGoogle Scholar
Schnipke, D. L., Scrams, D. J., (1997). Modeling item response times with a two-state mixture model: A new method of measuring speededness Journal of Educational Measurement 34(3) 213232 10.1111/j.1745-3984.1997.tb00516.xCrossRefGoogle Scholar
Schwarz, G., (1978). Estimating the dimension of a model The Annals of Statistics 6 461464 10.1214/aos/1176344136CrossRefGoogle Scholar
Sclove, S. L., (1987). Application of model-selection criteria to some problems in multivariate analysis Psychometrika 52(3) 333343 10.1007/BF02294360CrossRefGoogle Scholar
She, H. C., Stacey, K., Schmidt, W. H., (2018). Science and mathematics literacy: PISA for better school education International Journal of Science and Mathematics Education 16(1) 15 10.1007/s10763-018-9911-1CrossRefGoogle Scholar
Smit, A., Kelderman, H., van der Flier, H., (2000). The mixed Birnbaum model: Estimation using collateral information Methods of Psychological Research Online 5(4) 3143Google Scholar
Smit, J., Kelderman, H., Flier, H., et al. (2000). Collateral information and mixed Rasch models Methods of Psychological Research Online 5(4) 3143Google Scholar
Stacey, K. (2015). The international assessment of mathematical literacy: PISA 2012 framework and items. In Selected regular lectures from the 12th international congress on mathematical education (pp. 771–790).CrossRefGoogle Scholar
Tellaroli, P., Bazzi, M., Donato, M., Finos, L., Courcoux, P., & Lanera, C. (2018). Package ‘crossclustering’. R package version.Google Scholar
Templin, J., Poggio, A., Irwin, P., & Henson, R. (2007). Latent class model based approaches to standard setting. In Annual meeting of the national council on measurement in education.Google Scholar
Tofighi, D., Enders, C. K., (2008). Identifying the correct number of classes in growth mixture models Advances in Latent Variable Mixture Models 2007(1) 317Google Scholar
Vermunt, J. K. (2007). Multilevel mixture item response theory models: An application in education testing. Proceedings of the 56th session of the International Statistical Institute. p. 2228.Google Scholar
von Davier, M., (2010). Hierarchical mixtures of diagnostic models Psychological Test and Assessment Modeling 52(1) 8Google Scholar
Wang, C., Fan, Z., Chang, H-H Douglas, J. A., (2013). A semiparametric model for jointly analyzing response times and accuracy in computerized testing Journal of Educational and Behavioral Statistics 38(4) 381417 10.3102/1076998612461831CrossRefGoogle Scholar
Willms, J. D. (2006). Learning divides: Ten policy questions about the performance and equity of schools and schooling systems. UNESCO Institute for Statistics Montreal.Google Scholar
Willms, J. D., (2010). School composition and contextual effects on student outcomes Teachers College Record 112(4) 10081037 10.1177/016146811011200408CrossRefGoogle Scholar
Wilson, M., (1989). Saltus: A psychometric model of discontinuity in cognitive development Psychological Bulletin 105(2) 276 10.1037/0033-2909.105.2.276CrossRefGoogle Scholar
Yamamoto, K., Everson, H., Rost, J., Langeheine, R., (1997). Applications of latent trait and latent class models in the social sciences chap. Modeling the effects of test length and test time on parameter estimation using the HYBRID model Waxman WaxmanWaxmanGoogle Scholar