A Systematic Review of Argument-Based Validation Studies in the Field of Language Testing (2000–2018)

doi:10.1017/9781108669849.005

3 - A Systematic Review of Argument-Based Validation Studies in the Field of Language Testing (2000–2018)

from Part I - Basic Concepts and Uses of Validity Argument in Language Testing and Assessment

Published online by Cambridge University Press: 14 January 2021

Ahmet Dursun and

Zhi Li

Edited by

Carol A. Chapelle and

Erik Voss

Show author details

Carol A. Chapelle: Affiliation:
Iowa State University
Erik Voss: Affiliation:
Teachers College, Columbia University

Book contents

Get access

Summary

Since the publication of Kane (2006) on argument-based validation and the validation project by Chapelle, Enright, and Jamieson (2008), a trend of employing argument-based approach in language testing validation research has emerged as observed by Chapelle and Voss (2013). To better understand this recent trend, this systematic review study identified and analyzed the argument-based validation studies published from 2000 to 2018. A comprehensive literature search was conducted with multiple search terms (e.g., validity, argument-based validation, inferences, etc.) on a variety of research publication sources, including peer-reviewed academic journals, research reports, and dissertations. After applying pre-established inclusion criteria, 70 studies were retained, including 45 journal articles or research reports and 25 doctoral dissertations. The claims and inferences employed in these studies were analyzed into themes and categorized under Chapelle, Enright, and Jamieson (2008)’s framework. In addition, the research methodology addressing the warrants, rebuttals and backing in each study was documented and reviewed. Based on the results of this analysis, we make suggestions about constructing interpretation and use arguments as well as evaluating the coherence and plausibility of the validity arguments in various testing contexts.

Keywords

argument based approach language testing research reports journal articles language assessment research

Type: Chapter
Information: Validity Argument in Language Testing
Case Studies of Validation Research
, pp. 45 - 70

DOI: https://doi.org/10.1017/9781108669849.005 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Barkaoui, K. (2014). Examining the impact of L2 proficiency and keyboarding skills on scores on TOEFL-iBT writing tasks. Language Testing, 31(2), 241–259. https://doi.org/10.1177/0265532213509810 CrossRef Google Scholar

Barkaoui, K. (2015). Test takers’ writing activities during the TOEFL iBT® writing tasks: A stimulated recall study. ETS Research Report Series, (1), 1–42. https://doi.org/10.1002/ets2.12050 CrossRef Google Scholar

Barkaoui, K., & Knouzi, I. (2018). The effects of writing mode and computer ability on L2 test-takers’ essay characteristics and scores. Assessing Writing, 46, 19–31. https://doi.org/10.1016/j.asw.2018.02.005 Google Scholar

Becker, A. P. (2011). Building evidence for the evaluation of English learners’ writing scores. Unpublished doctoral dissertation, Northern Arizona University, Flagstaff, AZ.Google Scholar

Becker, A. (2018). Not to scale? An argument-based inquiry into the validity of an L2 writing rating scale. Assessing Writing, 37, 1–12. https://doi.org/10.1016/j.asw.2018.01.001 Google Scholar

Bejar, I. I., Deane, P. D., Flor, M., & Chen, J. (2017). Evidence of the generalization and construct representation inferences for the GRE ® revised General Test sentence equivalence item type. ETS Research Report Series, (1), 1–25. https://doi.org/10.1002/ets2.12134 CrossRef Google Scholar

Biber, D., & Gray, B. (2013). Discourse characteristics of writing and speaking task types on the TOEFL iBT: A lexico-grammatical analysis. ETS TOEFL Research Report Series.Google Scholar

Bogorevich, V. (2018). Native and non-native raters of L2 speaking performance: Accent familiarity and cognitive processes. Unpublished doctoral dissertation, Northern Arizona University, Flagstaff, AZ.Google Scholar

Carroll, P. E., & Bailey, A. L. (2016). Do decision rules matter? A descriptive study of English language proficiency assessment classifications for English-language learners and native English speakers in fifth grade. Language Testing, 33(1), 23–52. https://doi.org/10.1177/0265532215576380 Google Scholar

Chapelle, C. A., Chung, Y.-R., Hegelheimer, V., Pendar, N., & Xu, J. (2010). Towards a computer-delivered test of productive grammatical ability. Language Testing, 27(4), 443–469. https://doi.org/10.1177/0265532210367633 CrossRef Google Scholar

Chapelle, C. A., Cotos, E., & Lee, J. (2015). Validity arguments for diagnostic assessment using automated writing evaluation. Language Testing, 32(3), 385–405. https://doi.org/10.1177/0265532214565386 CrossRef Google Scholar

Checa-García, I., & Guiberson, M. (2019). Test validity in morphosyntactic measures for typical and SLI incipient Spanish–English bilinguals. Language Testing, 36(1), 77–100. https://doi.org/10.1177/0265532217724603 CrossRef Google Scholar

Cheng, L., & Sun, Y. (2015). Interpreting the impact of the Ontario Secondary School Literacy Test on second language students within an argument-based validation framework. Language Assessment Quarterly, 12(1), 50–66.Google Scholar

Chung, Y.-R. (2014). A test of productive English grammatical ability in academic writing: Development and validation. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar

Deygers, B., van den Branden, K., & van Gorp, K. (2018). University entrance language tests: A matter of justice. Language Testing, 35(4), 449–476. https://doi.org/10.1177/0265532217706196 CrossRef Google Scholar

Doe, C. D. (2013). Validating the Canadian academic English language assessment for diagnostic purposes from three perspectives: Scoring, teaching, and learning. Unpublished doctoral dissertation, Queen’s University, Kingston, ON.Google Scholar

Enright, M. K., & Quinlan, T. (2010). Complementing human judgment of essays written by English language learners with e-rater® scoring. Language Testing, 27(3), 317–334.Google Scholar

Esfandiari, M. R., Riasati, M. J., Vaezian, H., & Rahimi, F. (2018). A quantitative analysis of TOEFL iBT using an interpretive model of test validity. Language Testing in Asia, 8(1), 7. https://doi.org/10.1186/s40468–018-0062-7 Google Scholar

Frost, K., Elder, C., & Wigglesworth, G. (2011). Investigating the validity of an integrated listening-speaking task: A discourse-based analysis of test takers’ oral performances. Language Testing, 29(3), 345–369. https://doi.org/10.1177/0265532211424479 CrossRef Google Scholar

Gaillard, S. (2014). The elicited imitation task as a method for French proficiency assessment in institutional and research settings. Unpublished doctoral dissertation, University of Illinois at Urbana-Champaign, Champaign, IL.Google Scholar

Gu, L., Lockwood, J., & Powers, D. E. (2015). Evaluating the TOEFL Junior® standard test as a measure of progress for young English language learners. ETS Research Report Series. https://doi.org/10.1002/ets2.12064 Google Scholar

Harsch, C., Ushioda, E., & Ladroue, C. (2017). Investigating the predictive validity of TOEFL iBT® test scores and their use in informing policy in a United Kingdom University setting. ETS Research Report Series, (1), 1–80. https://doi.org/10.1002/ets2.12167 Google Scholar

He, L., & Min, S. (2017). Development and validation of a computer adaptive EFL test. Language Assessment Quarterly, 14(2), 160–176. https://doi.org/10.1080/15434303.2016.1162793 Google Scholar

Isbell, D. R. (2017). Assessing C2 writing ability on the Certificate of English Language Proficiency: Rater and examinee age effects. Assessing Writing, 34, 37–49. https://doi.org/10.1016/j.asw.2017.08.004 Google Scholar

Jia, Y. (2013). Justifying the use of a second language oral test as an exit test in Hong Kong: An application of assessment use argument framework. Unpublished doctoral dissertation, University of California, Los Angeles.Google Scholar

Johnson, R. C. (2011). Assessing the assessments: Using an argument-based validity framework to assess the validity and use of an English placement system in a foreign language context. Unpublished doctoral dissertation, Macquarie University, Sydney, Australia.Google Scholar

Johnson, R. C., & Riazi, A. M. (2015). Accuplacer Companion in a foreign language context: An argument-based validation of both test score meaning and impact. Papers in Language Testing and Assessment, 4(1), 31–58.Google Scholar

Jun, H. S. (2014). A validity argument for the use of scores from a web-search-permitted and web-source-based integrated writing test. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar

Kadir, A. K. (2008). Framing a validity argument for test use and impact: The Malaysian public service experience. Unpublished doctoral dissertation, University of Illinois at Urbana-Champaign, Champaign, IL.Google Scholar

Kelly-Riley, D., & Elliot, N. (2014). The WPA Outcomes Statement, validation, and the pursuit of localism. Assessing Writing, 21, 89–103.Google Scholar

Kim, E.-Y. J. (2017). The TOEFL iBT writing: Korean students’ perceptions of the TOEFL iBT writing test. Assessing Writing, 33, 1–11. https://doi.org/10.1016/J.ASW.2017.02.001 Google Scholar

Klebanov, B., Ramineni, C., Kaufer, D., Yeoh, P., & Ishizaki, S. (2017). Advancing the validity argument for standardized writing tests using quantitative rhetorical analysis. Language Testing, 36(1): 125–144. https://doi.org/10.1177/0265532217740752 Google Scholar

Knoch, U., Macqueen, S., & O’Hagan, S. (2014). An investigation of the effect of task type on the discourse produced by students at various score levels in the TOEFL iBT® Writing Test. ETS Research Report Series. https://doi.org/10.1002/ets2.12038 Google Scholar

Koizumi, R., In’nami, Y., Asano, K., & Agawa, T. (2016). Validity evidence of Criterion® for assessing L2 writing proficiency in a Japanese university context. Language Testing in Asia, 6(5), 1–26. https://doi.org/10.1186/s40468–016-0027-7 CrossRef Google Scholar

Kumazawa, T., Shizuka, T., Mochizuki, M., & Mizumoto, M. (2016). Validity argument for the VELC Test® score interpretations and uses. Language Testing in Asia, 16, 1.Google Scholar

Kyle, K., Crossley, S. A., & McNamara, D. S. (2016). Construct validity in TOEFL iBT speaking tasks: Insights from natural language processing. Language Testing, 33(3), 319–340. https://doi.org/10.1177/0265532215587391 Google Scholar

LaFlair, G. T., & Staples, S. (2017). Using corpus linguistics to examine the extrapolation inference in the validity argument for a high-stakes speaking assessment. Language Testing, 34(4), 451–475. https://doi.org/10.1177/0265532217713951 Google Scholar

Lallmamode, S. P., Mat Daud, N., & Abu Kassim, N. L. (2016). Development and initial argument-based validation of a scoring rubric used in the assessment of L2 writing electronic portfolios. Assessing Writing, 30, 44–62. https://doi.org/10.1016/j.asw.2016.06.001 Google Scholar

Lesnov, R. (2018). The role of content-rich visuals in the L2 academic listening assessment construct. Unpublished doctoral dissertation, Northern Arizona University, Flagstaff, AZ.Google Scholar

Li, S. (2018). Developing a test of L2 Chinese pragmatic comprehension ability. Language Testing in Asia, 8(1), 3. https://doi.org/10.1186/s40468–018-0054-7 Google Scholar

Li, Z. (2015a). Using a self-assessment of English use as a tool to validate the English Placement Test. Papers in Language Testing and Assessment, 3(2), 59–96.Google Scholar

Li, Z. (2015b). An argument-based validation study of the English Placement Test (EPT): Focusing on the inferences of extrapolation and ramification. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar

Lim, G. S. (2009). Prompt and rater effects in second language writing performance assessment. Unpublished doctoral dissertation, University of Michigan, Ann Arbor, MI.Google Scholar

Link, S. M. (2015). Development and validation of an automated essay scoring engine to assess students’ development across program levels. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar

Llosa, L. (2005). Building and supporting a validity argument for a standards-based classroom assessment of English proficiency. Unpublished doctoral dissertation, University of California, Los Angeles.Google Scholar

Llosa, L. (2007). Validating a standards-based classroom assessment of English proficiency: A multitrait-multimethod approach. Language Testing, 24(4), 489–515.Google Scholar

Llosa, L., & Malone, M. E. (2018). Comparability of students’ writing performance on TOEFL iBT and in required university writing courses. Language Testing. https://doi.org/10.1177/0265532218763456 Google Scholar

Mendoza, A., & Knoch, U. (2018). Examining the validity of an analytic rating scale for a Spanish test for academic purposes using the argument-based approach to validation. Assessing Writing, 35, 41–55. https://doi.org/10.1016/j.asw.2017.12.003 CrossRef Google Scholar

Mozgalina, A. (2015). Applying an argument-based approach for validating language proficiency assessments in second language acquisition research: The elicited imitation test for Russian. Unpublished doctoral dissertation, Georgetown University, Washington, DC.Google Scholar

Oh, S. R. (2018). Investigating test-takers’ use of linguistic tools in second language academic writing assessment. Unpublished doctoral dissertation, Teachers College, Columbia University, New York, NY.Google Scholar

Pardo-Ballester, C. (2007). The development of a web-based Spanish listening placement exam. Unpublished doctoral dissertation, University of California, Davis, CAGoogle Scholar

Pardo-Ballester, C. (2010). The validity argument of a web-based Spanish listening exam: Test usefulness evaluation. Language Assessment Quarterly, 7(2), 137–159.Google Scholar

Park, M. (2015). Development and validation of virtual interactive tasks for an aviation English assessment. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar

Riazi, A. M. (2016). Comparing writing performance in TOEFL-iBT and academic assignments: An exploration of textual features. Assessing Writing, 28, 15–27. https://doi.org/10.1016/j.asw.2016.02.001 Google Scholar

Santos, V. (2017). A computer-adaptive test of productive and contextualized academic vocabulary breadth in English (CAT-PAV): Development and validation. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar

Sawaki, Y., & Sinharay, S. (2013). Investigating the value of section scores for the TOEFL iBT® TEST. ETS Research Report Series, (2), i–113. https://doi.org/10.1002/j.2333-8504.2013.tb02342.x CrossRef Google Scholar

Sawaki, Y., & Sinharay, S. (2018). Do the TOEFL iBT® section scores provide value-added information to stakeholders? Language Testing, 35(4), 529–556. https://doi.org/10.1177/0265532217716731 Google Scholar

Schmidgall, J. E. (2017). The consistency of TOEIC® speaking scores across ratings and tasks. ETS Research Report Series, (1), 1–8. https://doi.org/10.1002/ets2.12178 CrossRef Google Scholar

Schmidgall, J. E., Getman, E. P., & Zu, J. (2018). Screener tests need validation too: Weighing an argument for test use against practical concerns. Language Testing, 35(4), 583–607. https://doi.org/10.1177/0265532217718600 Google Scholar

Sims, J. M., & Kunnan, A. J. (2016). Developing evidence for a validity argument for an English placement exam from multi-year test performance data. Language Testing in Asia, 6(1), 1. https://doi.org/10.1186/s40468–016-0024-x Google Scholar

Tominaga, W. (2014). Validating the scoring inference of the Japanese OPI ratings: The use of extended turns, connective expressions, and discourse organization. Unpublished doctoral dissertation, University of Hawai’i at Manoa.Google Scholar

Trace, J. (2017). A validation argument for cloze test item function in second language assessment. Unpublished doctoral dissertation, University of Hawai’i at Manoa.Google Scholar

Trace, J., Janssen, G., & Meier, V. (2017). Measuring the impact of rater negotiation in writing performance assessment. Language Testing, 34(1), 3–22. https://doi.org/10.1177/0265532215594830 Google Scholar

Wang, H. (2010). Investigating the justifiability of an additional test use: An application of assessment use argument to an English as a foreign language test. Unpublished doctoral dissertation, University of California, Los Angeles.Google Scholar

Weigle, S. C. (2011). Validation of automated scores of TOEFL iBT® tasks against nontest indicators of writing ability. ETS Research Report Series, (2), i–63.CrossRef Google Scholar

Voss, E. (2012). A validity argument for score meaning of a computer-based ESL academic collocational ability test based on a corpus-driven approach to test design. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar

Yang, H. (2016). Integration of a web-based rating system with an oral proficiency interview test: Argument-based approach to validation. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar

Youn, S. J. (2013). Validating task-based assessment of L2 pragmatics in interaction using mixed methods. Unpublished doctoral dissertation, University of Hawai’i at Manoa.Google Scholar

Youn, S. J. (2015). Validity argument for assessing L2 pragmatics in interaction using mixed methods. Language Testing, 32(2), 199–225. https://doi.org/10.1177/0265532214557113 Google Scholar

Xi, X., Higgins, D., Zechner, K., & Williamson, D. M. (2008). Automated scoring of spontaneous speech using SpeechRater v.1.0. ETS Research Report Series, (2).Google Scholar

References

Bachman, L. F. (2005). Building and supporting a case for test use. Language Assessment Quarterly, 2(1), 1–34. https://doi.org/10.1207/s15434311laq0201_1 Google Scholar

Bachman, L. F., & Palmer, A. S. (2010). Language assessment in practice: Developing language assessment and justifying their use in the real world. Oxford: Oxford University Press.Google Scholar

Brennan, L. R. (2013). Commentary on “Validating the interpretations and uses of test scores.” Journal of Educational Measurement, 50(1), 74–83. https://doi.org/10.1111/jedm.12001 Google Scholar

Chapelle, C. A. (2021). Argument-based validation in testing and assessment. Thousand Oaks, CA: Sage Publications.Google Scholar

Chapelle, C. A., Enright, M. K., & Jamieson, J. (2008). Building a validity argument for the Test of English as a Foreign Language. New York and London: Routledge.Google Scholar

Chapelle, C. A., Enright, M. K., & Jamieson, J. (2010). Does an argument-based approach to validity make a difference? Educational Measurement: Issues and Practice, 29(1), 3–13.Google Scholar

Chapelle, C. A., & Voss, E. (2013). Evaluation of language tests through validation research. In Kunnan, A. J. (Ed.), The companion to language assessment III:9:65 (pp. 1079–1097). Chichester: John Wiley and Sons, Inc.Google Scholar

Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.Google Scholar

Kane, M. T. (2004). Certification testing as an illustration of Argument-based validation. Measurement: Interdisciplinary Research and Perspectives, 2(3), 135–170. https://doi.org/10.1207/s15366359mea0203_1 Google Scholar

Kane, M. T. (2006). Validation. In Brennen, R. (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: Greenwood Publishing.Google Scholar

Kane, M. T. (2012). Validating score interpretations and uses. Language Testing, 29(1), 3–17.Google Scholar

Kane, M. T. (2013a). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. http://doi.org/10.1111/jedm.12000 Google Scholar

Kane, M. T. (2013b). Validation as a pragmatic, scientific activity. Journal of Educational Measurement, 50(1), 115–122. http://doi.org/10.1111/jedm.12007 Google Scholar

Kane, M. T. (2016). Explicating validity. Assessment in Education: Principles, Policy & Practice, 23(2), 198–211. https://doi.org/10.1080/0969594X.2015.1060192 Google Scholar

Messick, S. (1989). Validity. In Linn, R. (Ed.), Educational measurement (3rd ed., pp. 13–103). Washington, DC: American Council on Education.Google Scholar

Norris, J. M., & Ortega, L. (2007). The future of research synthesis in applied linguistics: Beyond art or science. TESOL Quarterly, 41(4), 805–815. https://doi.org/10.1002/j.1545-7249.2007.tb00105.x Google Scholar

Siddaway, A. P., Wood, A. M., & Hedges, L. V. (2019). How to do a systematic review: A best practice guide for conducting and reporting narrative reviews, meta-analyses, and meta-syntheses. Annual Review of Psychology, 70(1), 747–770. https://doi.org/10.1146/annurev-psych-010418-102803 Google Scholar

Xi, X. (2010). How do we go about investigating test fairness? Language Testing, 27(2), 147–170.Google Scholar

Book contents

3 - A Systematic Review of Argument-Based Validation Studies in the Field of Language Testing (2000–2018)

Summary

Keywords

Access options

References

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive