The Telephone Standard Speaking Test

doi:10.1017/9781108669849.010

7 - The Telephone Standard Speaking Test

An Outside Evaluator’s Investigation of a Rebuttal to the Generalization Inference

from Part II - Investigating Score Interpretations

Published online by Cambridge University Press: 14 January 2021

Rie Koizumi

Edited by

Carol A. Chapelle and

Erik Voss

Show author details

Carol A. Chapelle: Affiliation:
Iowa State University
Erik Voss: Affiliation:
Teachers College, Columbia University

Book contents

Get access

Summary

The argument-based validation research reported in this chapter was conducted from the perspective of an outside evaluator with concerns about the consistency of scores on the Telephone Standard Speaking Test (TSST), a telephone-based test of second language (L2) English speaking proficiency used to assess improvement in speaking proficiency over time. The test use requires that the warrant for generalization be plausible. It states that observed scores are estimates of expected scores, which are consistent across test tasks, forms, occasions, and raters. To guide the investigation a rebuttal, that observed scores fail to estimate expected scores due to error introduced in the testing process, was formulated. The research investigated two of its assumptions. Data of the TSST scores from 55 undergraduates at two Japanese universities collected twice within a month indicated that test forms had the same means and the same SDs, and that the two scores of each participant were highly correlated. One-third of scores for the same individual differed by one score level. Thus, the results found partial support for one of the assumptions underlying the rebuttal. This chapter concludes by highlighting the important role of rebuttals for including threats of concern to test users in an interpretation/use argument.

Keywords

speaking proficiency rating scale generalization inference telephone-based test rebuttal

Type: Chapter
Information: Validity Argument in Language Testing
Case Studies of Validation Research
, pp. 154 - 175

DOI: https://doi.org/10.1017/9781108669849.010 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Abe, M., Kondo, Y., Kobayashi, Y., Murakami, A., & Fujiwara, Y. (July 2018). Initial findings from a longitudinal learner corpus: A year-long development of L2 speaking performance. Paper presented at the 13th Teaching and Language Corpora Conference 2018, University of Cambridge, UK.Google Scholar

ALC. (2014). ALC eigo kyoiku jittai repoto 2014: Supikingu tesuto to gakushu adobaisu gyomu wo chushinni [ALC English Education Field Survey Report 2014: Focus on a speaking test and advice-giving business operations]. Tokyo: ALC. Retrieved from www.alc.co.jp/company/report/Google Scholar

ALC Educational Research Institute. (2016). Nihonjin no eigo supikingu noryoku: Risuningu, ridingu ryoku tono kankeisei ni miru eigo unyo noryoku no jittai [English speaking proficiency of Japanese learners: From a viewpoint of relationships between listening and reading abilities]. ALC English Education Field Survey Report, Vol. 7. Tokyo: Author. Retrieved from www.alc.co.jp/company/report/pdf/alc_report_20160627.pdf Google Scholar

ALC Educational Research Institute. (2018). Nihon no kokosei no eigo supikingu noryoku jittai chosa III: Koko ichinenji kara sannenji de kokosei no eigoryoku wa donoyoni henkashitaka [How senior high school students’ English speaking proficiency changed from the first year to the third year]. ALC English Education Field Survey Report, Vol. 11. Tokyo: Author. Retrieved from www.alc.co.jp/company/report/pdf/alc_report_20180731.pdf Google Scholar

Aryadoust, V. (2013). Building a validity argument for a listening test of academic proficiency. Newcastle: Cambridge Scholars Publishing.Google Scholar

Barkaoui, K. (2017). Examining repeaters’ performance on second language proficiency tests: A review and a call for research [Commentary]. Language Assessment Quarterly, 14, 420–431. doi:10.1080/15434303.2017.1347790Google Scholar

Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (Eds.). (2008). Building a validity argument for the Test of English as a Foreign Language™. New York: Routledge.Google Scholar

Creswell, J. W., & Plano Clark, V. L. (2018). Designing and conducting mixed methods research (3rd ed.). Thousand Oaks, CA: Sage.Google Scholar

Glass, G. V., & Hopkins, K. D. (1996). Statistical methods in education and psychology (3rd ed.). Boston, MA: Allyn & Bacon.Google Scholar

Gliner, J. A., Morgan, G. A., & Leech, N. L. (2017). Research methods in applied settings: An integrated approach to design and analysis (3rd ed.). New York: Routledge.Google Scholar

Harvill, L. M. (1991). An NCME instructional module on standard error of measurement [Instructional topics in educational measurement]. Educational Measurement: Issues and Practice, 10(2), 181–189. doi:10.1111/j.1745-3992.1991.tb00195.xCrossRef Google Scholar

Henning, G. (1987). A guide to language testing: Development, evaluation, research. Boston, MA: Heinle & Heinle.Google Scholar

Johnson, R. C. (2012). Assessing the assessments: Using an argument-based validity framework to assess the validity and use of an English placement system in a foreign language context. Doctoral dissertation, Macquarie University, Sydney, Australia.Google Scholar

Kane, M. T. (2006). Validation. In Brennan, R. L. (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: American Council on Education and Praeger.Google Scholar

Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1–73. doi:10.1111/jedm.12000Google Scholar

Knoch, U., & Chapelle, C. A. (2018). Validation of rating processes within an argument-based framework. Language Testing, 35, 477–499. doi:10.1177/0265532217710049CrossRef Google Scholar

Koizumi, R. (2018). Eigo yongino tesuto no erabikata to tsukaikata: Datosei no kantenkara [How we can select and use English four-skill tests: From the viewpoint of validity]. Tokyo: ALC.Google Scholar

Koizumi, R., In’nami, Y., Azuma, J. Asano, K., Agawa, T., & Eberl, D. (2015). Assessing L2 proficiency growth: Considering regression to the mean and the standard error of difference. Shiken, 19(1), 3–15. Retrieved from http://teval.jalt.org/node/16 Google Scholar

Kunnan, A. J. (2018). Evaluating language assessments. New York: Routledge.Google Scholar

Llosa, L. (2008). Building and supporting a validity argument for a standards-based classroom assessment of English proficiency based on teacher judgments. Educational Measurement: Issues and Practice, 27(3), 32–42. doi:10.1111/j.1745-3992.2008.00126.xCrossRef Google Scholar

Marsden, E., & Torgerson, C. J. (2012). Single group, pre- and post-test research designs: Some methodological concerns. Oxford Review of Education, 38, 583–616. doi:10.1080/03054985.2012.731208Google Scholar

McManus, I. C. (2012). The misinterpretation of the standard error of measurement in medical education: A primer on the problems, pitfalls and peculiarities of the three different standard errors of measurement. Medical Teacher, 34, 569–576. doi:10.3109/0142159X.2012.670318CrossRef Google Scholar PubMed

Mizumoto, A., & Plonsky, L. (2016). R as a lingua franca: Advantages of using R for quantitative research in applied linguistics. Applied Linguistics, 37, 284–291. doi:10.1093/applin/amv025Google Scholar

Ogino, K. (2002). Eigo supikingu noryoku tesuto SST towa nanika [What is the Standard Speaking Test?]. In Waseda Oral Communication Research Institute Research Report (pp. 2–9). Tokyo: Waseda Oral Communication Research Institute.Google Scholar

Pardo-Ballester, C. (2010). The validity argument of a web-based Spanish Listening Exam: Test usefulness evaluation. Language Assessment Quarterly, 7, 137–159. doi:10.1080/15434301003664188Google Scholar

Riazi, A. M. (2016). The Routledge encyclopedia of research methods in applied linguistics: Quantitative qualitative, and mixed-methods research. Oxon, Oxford: Routledge.Google Scholar

Schwarz, W., & Reike, D. (2018). Regression away from the mean: Theory and examples. British Journal of Mathematical and Statistical Psychology, 71, 186–203. doi:10.1111/bmsp.12106Google Scholar

Suzuki, Y., & Koizumi, R. (in press). Using equivalent test forms in SLA pretest-posttest design research. In Winke, P. & Brunfaut, T. (Eds.), The Routledge handbook of second language acquisition and language testing. New York: Routledge.Google Scholar

Xi, X. (2010). How do we go about investigating test fairness? Language Testing, 27, 147–170. doi:10.1177/0265532209349465Google Scholar

Zhang, Y. (2008). Repeater analyses for TOEFL iBT. Research Memorandum 08-05. Princeton, NJ: Educational Testing Service. Retrieved from www.ets.org/research/policy_research_reports/publications/report/2008/ibya Google Scholar

Zhou, Y. (2015). Comparing ratings of a face-to-face and telephone-mediated speaking test. JACET Journal, 59, 33–52. Retrieved from http://dl.ndl.go.jp/info:ndljp/pid/10501826?tocOpened=1 Google Scholar