Hostname: page-component-586b7cd67f-tf8b9 Total loading time: 0 Render date: 2024-11-30T20:32:17.718Z Has data issue: false hasContentIssue false

LIMITATIONS OF SIZE AND LEVELS TESTS OF WRITTEN RECEPTIVE VOCABULARY KNOWLEDGE

Published online by Cambridge University Press:  29 June 2020

Tim Stoeckel*
Affiliation:
University of Niigata Prefecture
Stuart McLean
Affiliation:
Momoyama Gakuin University
Paul Nation
Affiliation:
Victoria University of Wellington
*
*Correspondence concerning this article should be addressed to Tim Stoeckel, University of Niigata Prefecture, 471 Ebigase, Higashi-ku, Niigata City, Niigata 950-8680, Japan. E-mail: [email protected]

Abstract

Two commonly used test types to assess vocabulary knowledge for the purpose of reading are size and levels tests. This article first reviews several frequently stated purposes of such tests (e.g., materials selection, tracking vocabulary growth) and provides a reasoned argument for the precision needed to serve such purposes. Then three sources of inaccuracy in existing tests are examined: the overestimation of lexical knowledge from guessing or use of test strategies under meaning-recognition item formats; the overestimation of vocabulary knowledge when receptive understanding of all word family members is assumed from a correct response to an item assessing knowledge of just one family member; and the limited precision that a small, random sample of target words has in representing the population of words from which it is drawn. The article concludes that existing tests lack the accuracy needed for many specified testing purposes and discusses possible improvements going forward.

Type
State of the Scholarship
Copyright
© The Author(s), 2020. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

We would like to extend special thanks to Phil Bennett, Dale Brown, and Brandon Kramer for their kind and useful suggestions for improving this article.

References

REFERENCES

Agustin‐Llach, M. P., & Canga Alonso, A. (2016). Vocabulary growth in young CLIL and traditional EFL learners: Evidence from research and implications for education. International Journal of Applied Linguistics, 26, 211227. https://doi.org/10.1111/ijal.12090.CrossRefGoogle Scholar
Aizawa, K. (2006). Rethinking frequency markers for English–Japanese dictionaries. In Murata, M., Minamide, K., Tono, Y., & Ishikawa, S. (Eds.), English lexicography in Japan (pp. 108119). Taishukan Publishing Company.Google Scholar
Aviad-Levitzky, T., Laufer, B., & Goldstein, Z. (2019). The new computer adaptive test of size and strength (CATSS): Development and validation. Language Assessment Quarterly, 16, 345368. https://doi.org/10.1080/15434303.2019.1649409.CrossRefGoogle Scholar
Bauer, L., & Nation, P. (1993). Word families. International Journal of Lexicography, 6, 253279. https://doi.org/10.1093/ijl/6.4.253.CrossRefGoogle Scholar
Beglar, D. (2010). A Rasch-based validation of the Vocabulary Size Test. Language Testing, 31, 101118. https://doi.org/10.1177/0265532209340194.CrossRefGoogle Scholar
Bennett, P., & Stoeckel, T. (2012). Variations in format and willingness to skip items in a multiple-choice vocabulary test. Vocabulary Education and Research Bulletin, 1, 2. https://jaltvocab.weebly.com/publications.html.Google Scholar
Bennett, P., & Stoeckel, T. (2014). Word frequency and frequency of loanwords as predictors of word difficulty. Vocabulary Education and Research Bulletin, 3, 23. https://jaltvocab.weebly.com/publications.html.Google Scholar
Brown, D. (2013). Types of words identified as unknown by L2 learners when reading. System, 41, 10431055. https://doi.org/10.1016/j.system.2013.10.013.CrossRefGoogle Scholar
Brown, D. (2018). Examining the word family through word lists. Vocabulary Learning and Instruction, 7, 5165. http://vli-journal.org/wp/vli-v07-1-2187-2759/.Google Scholar
Browne, C., & Culligan, B. (2008). Combining technology and IRT testing to build student knowledge of high frequency vocabulary. The JALT CALL Journal, 4, 316. https://journal.jaltcall.org/issues/jaltcall-4-2.CrossRefGoogle Scholar
Browne, C., Culligan, B., & Phillips, J. (2013). The new general service list. http://www.newgeneralservicelist.orgGoogle Scholar
Chang, A. C.-S. (2012). Improving reading rate activities for EFL students: Timed reading and repeated oral reading. Reading in a Foreign Language, 24, 5683. https://nflrc.hawaii.edu/rfl/April2012/articles/chang.pdf.Google Scholar
Chen, Z., & Henning, G. (1985). Linguistic and cultural bias in language proficiency tests. Language Testing, 2, 155163. https://doi.org/10.1177/026553228500200204.CrossRefGoogle Scholar
Clopper, C. J., & Pearson, E. S. (1934). The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 26, 404413. https://doi.org/10.1093/biomet/26.4.404.CrossRefGoogle Scholar
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34, 213238. https://doi.org/10.2307/3587951.CrossRefGoogle Scholar
Coxhead, A., Nation, P., & Sim, D. (2015). Measuring the vocabulary size of native speakers of English in New Zealand secondary schools. New Zealand Journal of Educational Studies, 50, 121135. https://doi.org/10.1007/s40841-015-0002-3.CrossRefGoogle Scholar
Downing, S. M. (2011). Selected response item formats in test development. In Downing, S. M. & Haladyna, T. M. (Eds.), Handbook of test development (pp. 287302). Routledge.Google Scholar
Elgort, I. (2012). Effects of L1 definitions and cognate status of test items on the Vocabulary Size Test. Language Testing, 30, 253272. https://doi.org/10.1177/0265532212459028.CrossRefGoogle Scholar
Gardner, D. (2007). Validating the construct of word in applied corpus-based vocabulary research: A critical survey. Applied Linguistics, 28, 241265. https://doi.org/10.1093/applin/amm010.CrossRefGoogle Scholar
Gardner, D., & Davies, M. (2014). A new academic vocabulary list. Applied Linguistics, 35, 305327. https://doi.org/10.1093/applin/amt015.CrossRefGoogle Scholar
Gyllstad, H., McLean, S., & Stewart, J. (2019, July). Empirically investigating the adequacy of item sample sizes of vocabulary levels and vocabulary size tests: A bootstrapping approach. Paper presented at the Vocab@Leuven Conference, Leuven, Belgium. https://vocabatleuven.wordpress.com/program/Google Scholar
Gyllstad, H., Vilkaitė, L., & Schmitt, N. (2015). Assessing vocabulary size through multiple-choice formats: Issues with guessing and sampling rates. ITL—International Journal for Applied Linguistics, 166, 278306. https://doi.org/10.1075/itl.166.2.04gyl.CrossRefGoogle Scholar
Hu, M., & Nation, I. S. P. (2000). Unknown vocabulary density and reading comprehension. Reading in a Foreign Language, 13, 403430https://nflrc.hawaii.edu/rfl/PastIssues/originalissues.htmlGoogle Scholar
Karami, H. (2012). The development and validation of a bilingual version of the Vocabulary Size Test. RELC Journal, 43, 5367. https://doi.org/10.1177/0033688212439359.CrossRefGoogle Scholar
Kim, Y., & McLean, S. (2019, October). Online self-marking typing, speaking, listening, or reading vocabulary levels tests. Paper presented at the 27th Korea TESOL International Conference, Seoul, Korea. https://koreatesol.org/content/conference-book-full-version-pdf.Google Scholar
Kremmel, B. (2016). Word families and frequency bands in vocabulary tests: Challenging conventions. TESOL Quarterly, 50, 976987. https://doi.org/10.1002/tesq.329.CrossRefGoogle Scholar
Kremmel, B., & Schmitt, N. (2016). Interpreting vocabulary test scores: What do various item formats tell us about learners’ ability to employ words? Language Assessment Quarterly, 13, 377392. https://doi.org/10.1080/15434303.2016.1237516.CrossRefGoogle Scholar
Laufer, B., & Cobb, T. (2019). How much knowledge of derived words is needed for reading? Applied Linguistics. Advance online publication. https://doi.org/10.1093/applin/amz051.Google Scholar
Laufer, B., & Goldstein, Z. (2004). Testing vocabulary knowledge: Size, strength, and computer adaptiveness. Language Learning, 54, 399436. https://doi.org/10.1111/j.0023-8333.2004.00260.x.CrossRefGoogle Scholar
McCracken, C. E., & Looney, S. W. (2017). On finding the upper confidence limit for a binomial proportion when zero successes are observed. Journal of Biometrics & Biostatistics, 8, 338343. https://www.omicsonline.org/open-access/on-finding-the-upper-confidence-limit-for-a-binomial-proportion-when-zero-successes-are-observed-2155-6180-1000338.pdf.CrossRefGoogle Scholar
McLean, S. (2018). Evidence for the adoption of the flemma as an appropriate word counting unit. Applied Linguistics, 39, 823845. https://doi.org/10.1093/applin/amw050.CrossRefGoogle Scholar
McLean, S., & Kramer, B. (2015). The creation of a new vocabulary levels test. Shiken, 19, 111. http://teval.jalt.org/node/33.Google Scholar
McLean, S., Kramer, B., & Stewart, J. (2015). An empirical examination of the effect of guessing on vocabulary size test scores. Vocabulary Learning and Instruction, 4, 2635. http://vli-journal.org/wp/vli-v04-2-2187-2759/.Google Scholar
McLean, S., Stewart, J., & Batty, A. (2020). Predicting L2 reading proficiency with modalities of vocabulary knowledge. Language Testing. Advance online publication. https://doi.org/10.1177/0265532219898380CrossRefGoogle Scholar
McLean, S., Ishii, T., Stoeckel, T., Bennett, P., & Matsumoto, Y. (2016). An edited version of the first eight 1,000-word frequency bands of the Japanese-English version of the Vocabulary Size Test. The Language Teacher, 40, 37. http://jalt-publications.org/node/2/articles/5244-edited-version-firsteight-1000-word-frequency-bands-japanese-english-version-v.CrossRefGoogle Scholar
Meara, P., & Buxton, B. (1987). An alternative to multiple choice vocabulary tests. Language Testing, 4, 142151. https://doi.org/10.1177/026553228700400202.CrossRefGoogle Scholar
Milton, J. (2007). Lexical profiles, learning styles and construct validity of lexical size tests. In Daller, H., Milton, J., & Treffers-Daller, J. (Eds.), Modelling and assessing vocabulary knowledge (pp. 4558). Cambridge University Press.Google Scholar
Milton, J. (2009). Measuring second language vocabulary acquisition. Multilingual Matters.CrossRefGoogle Scholar
Mochizuki, M., & Aizawa, K. (2000). An affix acquisition order for EFL learners: An exploratory study. System, 28, 291304. https://doi.org/10.1016/S0346-251X(00)00013-0.CrossRefGoogle Scholar
Nation, I. S. P. (2006a). BNC-based word lists. Victoria University of Wellington. http://www.victoria.ac.nz/lals/about/staff/paul-nationGoogle Scholar
Nation, I. S. P. (2006b). How large a vocabulary is needed for reading and listening? The Canadian Modern Language Review, 63, 5982. https://doi.org/10.3138/cmlr.63.1.59.CrossRefGoogle Scholar
Nation, I. S. P. (2012a). The BNC/COCA word family lists . http://www.victoria.ac.nz/lals/about/staff/paul-nationGoogle Scholar
Nation, I. S. P. (2012b, August). Measuring vocabulary size in an uncommonly taught language. Paper presented at the International Conference on Language Proficiency Testing in the Less Commonly Taught Languages, Bangkok, Thailand. http://www.sti.chula.ac.th/files/conference%20file/doc/paul%20nation.pdfGoogle Scholar
Nation, I. S. P., & Webb, S. (2011). Researching vocabulary. Heinle-Cengage ELT.Google Scholar
Nation, P. (1983). Teaching and testing vocabulary. Guidelines, 5, 1225.Google Scholar
Nation, P. (2007). The four strands. Innovation in Language Learning and Teaching, 1, 213. https://doi.org/10.2167/illt039.0.CrossRefGoogle Scholar
Nation, P. (2013). Learning vocabulary in another language (2nd ed.). Cambridge University Press.CrossRefGoogle ScholarPubMed
Nation, P., & Beglar, D. (2007). A vocabulary size test. The Language Teacher, 31, 913. https://jalt-publications.org/tlt/issues/2007-07_31.7.Google Scholar
Nguyen, L. T. C., & Nation, I. S. P. (2011). A bilingual Vocabulary Size Test of English for Vietnamese learners. RELC Journal, 42, 8699. https://doi.org/10.1177/0033688210390264.CrossRefGoogle Scholar
Sánchez-Gutiérrez, C. H., Mailhot, H., Deacon, S. H., & Wilson, M. A. (2018). MorphoLex: A derivational morphological database for 70,000 English words. Behavior Research Methods, 50, 15681580. https://doi.org/10.3758/s13428-017-0981-8.CrossRefGoogle ScholarPubMed
Sasaki, M. (1991). A comparison of two methods for detecting differential item functioning in an ESL placement test. Language Testing, 8, 95111. https://doi.org/10.1177/026553229100800201.CrossRefGoogle Scholar
Sasao, Y., & Webb, S. (2017). The Word Part Levels Test. Language Teaching Research, 21, 1230. https://doi.org/10.1177/1362168815586083.CrossRefGoogle Scholar
Schmitt, N. (2010). Researching vocabulary: A vocabulary research manual. Palgrave Macmillan. https://doi.org/10.1057/9780230293977.CrossRefGoogle Scholar
Schmitt, N., & Meara, P. (1997). Researching vocabulary through a word knowledge framework. Studies in Second Language Acquisition, 19, 1736. https://doi.org/10.1017/S0272263197001022.CrossRefGoogle Scholar
Schmitt, N., & Schmitt, D. (2014). A reassessment of frequency and vocabulary size in L2 vocabulary teaching. Language Teaching, 47, 484503. https://doi.org/10.1017/S0261444812000018.CrossRefGoogle Scholar
Schmitt, N., Jiang, X., & Grabe, W. (2011). The percentage of words known in a text and reading comprehension. The Modern Language Journal, 95, 2643. https://doi.org/10.1111/j.1540-4781.2011.01146.x.CrossRefGoogle Scholar
Schmitt, N., Nation, P., & Kremmel, B. (2019). Moving the field of vocabulary assessment forward: The need for more rigorous test development and validation. Language Teaching. Advance online publication. https://doi.org/10.1017/S0261444819000326.Google Scholar
Schmitt, N., Schmitt, D., & Clapham, C. (2001). Developing and exploring the behaviour of two new versions of the Vocabulary Levels Test. Language Testing, 18, 5588. https://doi.org/10.1177/026553220101800103.CrossRefGoogle Scholar
Stewart, J. (2014). Do multiple-choice options inflate estimates of vocabulary size on the VST? Language Assessment Quarterly, 11, 271282. https://doi.org/10.1080/15434303.2014.922977.CrossRefGoogle Scholar
Stewart, J., & White, D. A. (2011). Estimating guessing effects on the Vocabulary Levels Test for differing degrees of word knowledge. TESOL Quarterly, 45, 370380. https://doi.org/10.5054/tq.2011.254523.CrossRefGoogle Scholar
Stewart, J., McLean, S., & Kramer, B. (2017). A response to Holster and Lake regarding guessing and the Rasch model. Language Assessment Quarterly, 14, 6974. https://doi.org/10.1080/15434303.2016.1262377.CrossRefGoogle Scholar
Stoeckel, T. (2018). High-frequency and academic English vocabulary growth among first-year students at UNP. Journal of International Studies and Regional Development, 9, 1530.Google Scholar
Stoeckel, T., & Bennett, P. (2013). Sources of differential item functioning between Korean and Japanese examinees on a second-language vocabulary test. Vocabulary Learning and Instruction, 2, 4754. http://vli-journal.org/wp/vli-v02-1-2187-2759/.Google Scholar
Stoeckel, T., & Bennett, P. (2015). A test of the new General Service List. Vocabulary Learning and Instruction, 4, 18. http://vli-journal.org/wp/vli-v04-1-2187-2759/.Google Scholar
Stoeckel, T., Ishii, T., & Bennett, P. (2018b). A Japanese–English bilingual version of the New General Service List Test. JALT Journal, 40, 521. https://mail.jalt-publications.org/articles/24292-japanese-english-bilingual-version-new-general-service-list-test.CrossRefGoogle Scholar
Stoeckel, T., & Sukigara, T. (2018). A serial multiple-choice format designed to reduce overestimation of meaning-recall knowledge on the Vocabulary Size Test. TESOL Quarterly, 52, 10501062. https://doi.org/10.1002/tesq.429.CrossRefGoogle Scholar
Stoeckel, T., Bennett, P., & McLean, S. (2016). Is “I don’t know” a viable answer choice on the Vocabulary Size Test? TESOL Quarterly, 50, 965975. https://doi.org/10.1002/tesq.325.CrossRefGoogle Scholar
Stoeckel, T., Ishii, T., & Bennett, P. (2018a). Is the lemma more appropriate than the flemma as a word counting unit? Applied Linguistics. Advance online publication. https://doi.org/10.1093/applin/amy059.Google Scholar
Stoeckel, T., Stewart, J., McLean, S., Ishii, T., Kramer, B., & Matsumoto, Y. (2019). The relationship of four variants of the Vocabulary Size Test to a criterion measure of meaning recall vocabulary knowledge. System. Advance online publication. https://doi.org/10.1016/j.system.2019.102161.CrossRefGoogle Scholar
Ward, J., & Chuenjundaeng, J. (2009). Suffix knowledge: Acquisition and applications. System, 37, 461469. https://doi.org/10.1016/j.system.2009.01.004.CrossRefGoogle Scholar
Webb, S., & Chang, A. C.-S. (2012). Second language vocabulary growth. RELC Journal, 43, 113126. https://doi.org/10.1177/0033688212439367.CrossRefGoogle Scholar
Webb, S., Sasao, Y., & Ballance, O. (2017). The updated Vocabulary Levels Test. ITL-International Journal of Applied Linguistics, 168, 3369. https://doi.org/10.1075/itl.168.1.02web.CrossRefGoogle Scholar
Zhang, X. (2013). The I don’t know option in the Vocabulary Size Test. TESOL Quarterly, 47, 790811. https://doi.org/10.1002/tesq.98.CrossRefGoogle Scholar
Zhao, P., & Ji, X. (2018). Validation of the Mandarin version of the Vocabulary Size Test. RELC Journal, 49, 308321. https://doi.org/10.1177/0033688216639761.CrossRefGoogle Scholar