Predicting vocabulary knowledge in adult L2 learners: The role of word-level variables across educational backgrounds

Marieke Vanbuel

doi:10.1017/S1366728924000889

Predicting vocabulary knowledge in adult L2 learners: The role of word-level variables across educational backgrounds

Published online by Cambridge University Press: 17 January 2025

Marieke Vanbuel

Show author details

Marieke Vanbuel*: Affiliation:
Ghent University, Department of Translation, Interpreting and Communication, Ghent, Belgium
*: Corresponding author: Marieke Vanbuel; Email: [email protected]

Article contents

Abstract
Literature review
Research questions
Methodology
Results
Discussion and conclusion
Limitations and implications
Data Availability
Competing interest
Footnotes
References

Rights & Permissions

Abstract

This study examines how word characteristics like frequency, concreteness, part of speech and length predict Dutch vocabulary knowledge in 763 adult migrant L2 learners who vary widely in their educational levels in their L1, from minimal to extensive formal education. While the impact of these features on vocabulary learning is well-documented among tertiary-educated adult and adolescent L2 learners in the academic track of secondary education, its effect on low-educated adult L2 learners has hardly been explored. Findings confirm that word frequency, concreteness and length significantly predict receptive vocabulary knowledge, aligning with prior research. However, the study also reveals variations in the predictive power of word frequency and length among adults with different educational backgrounds. These results highlight the necessity to reassess the applicability of findings from current research on L2 receptive vocabulary, particularly concerning adult learners with reduced educational backgrounds.

Keywords

receptive vocabulary second language acquisition LESLLA frequency concreteness word length word-related variables

Type: Research Article
Information: Bilingualism: Language and Cognition , First View , pp. 1 - 13

DOI: https://doi.org/10.1017/S1366728924000889 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial licence (http://creativecommons.org/licenses/by-nc/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Open Practices: Open data
Copyright: © The Author(s), 2025. Published by Cambridge University Press

There is a consensus in second language acquisition (SLA) research that certain words are easier for second language (L2) learners to acquire than others, attributed in part to intrinsic word characteristics (Laufer, Reference Laufer, Schmitt and McCarthy1997). Understanding these characteristics and their impact on L2 vocabulary acquisition is crucial, given the pivotal role of a comprehensive lexicon in comprehension tasks (e.g., Hu & Nation, Reference Hu and Nation2000; Zhang & Zhang, Reference Zhang and Zhang2022). Moreover, these insights can also be relevant for L2 instruction. As researchers have argued in the past, it is impossible to teach all the words that learners need to know in the L2 classroom due to limitations in the instruction time (e.g., Webb, Reference Webb and Webb2020). As such, insights into word features can aid educators in selecting words for explicit instruction.

Over time, various word characteristics such as frequency, concreteness, age of onset and linguistic distance have emerged as predictors of L2 learners’ vocabulary knowledge (Peters, Reference Peters and Webb2020). Research has also shown that these factors can have a differential impact on the L2 vocabulary acquisition of learners with a different language proficiency level (De Wilde et al., Reference De Wilde, Brysbaert and Eyckmans2020) or age (Puimège & Peters, Reference Puimège and Peters2019a). However, one area that remains relatively underexplored in the study of vocabulary knowledge is the impact of a learner’s educational background in their native language.

This oversight is notable because a considerable number of L2 learners have no or limited extensive schooling in their L1 and are either unable to read and write in any language that they may speak, or only have limited command of written language (UNESCO, 2022). Even though these learners – referred to as LESLLA learners (Literacy Education and Second Language Learning; Tarone, Reference Tarone2010) – constitute a sizeable segment of the L2 learning population, estimated to be between 20 and 40% (e.g., Carlsen & Rocca, Reference Carlsen and Rocca2021; D’Agostino & Mocciaro, 2021; Vágvölgyi et al., Reference Vágvölgyi, Coldea, Dresler, Schrader and Nuerk2016), they are largely underrepresented in SLA research (Tarone, Reference Tarone2010). Most studies in the domain of SLA rely on respondents who have attended or are attending formal secondary or tertiary education and have a functional command of written language. In doing so, the field underrepresents the diversity in the L2 learner population (Andringa & Godfroid, Reference Andringa and Godfroid2020; Ortega, Reference Ortega2005). As such, more than a few researchers have called for research that includes learners who may not fit that profile (e.g., Ortega, Reference Ortega2005; Tarone, Reference Tarone2010).

Furthermore, research with L1 adults demonstrates that limited or interrupted educational experiences and reduced literacy levels significantly affect learning subskills crucial for acquiring a second language and vocabulary in particular. Schooling and literacy, which are interrelated (Huettig & Mishra, Reference Huettig and Mishra2014; UNESCO, 2022), affect decoding skills, oral language processing, working memory and metalinguistic abilities (Kolinsky, Reference Kolinsky, Pollatsek and Treiman2015; Kurvers, Reference Kurvers2015). This suggests that the impact of word-level characteristics on vocabulary acquisition may differ markedly between LESLLA learners and more educated adult L2 learners. A word-level characteristic such as ‘concreteness’ may have a more substantial impact on the learnability of a word when learners struggle with abstract concepts (Huettig & Mishra, Reference Huettig and Mishra2014; Kolinsky, Reference Kolinsky, Pollatsek and Treiman2015). Additionally, LESLLA learners’ relatively lower phonological decoding skills (Kolinsky, Reference Kolinsky, Pollatsek and Treiman2015) may hinder their ability to process, repeat and store longer words in memory (Kosmidis et al., Reference Kosmidis, Tsapkini and Folia2006).

Despite these insights, few studies have examined vocabulary acquisition among LESLLA learners. One recent study showed that LESLLA learners have a significantly smaller vocabulary than their more educated counterparts at similar L2 proficiency levels (Deygers & Vanbuel, Reference Deygers and Vanbuel2022). Yet, this study did not explore to what extent word-level variables influence vocabulary knowledge in L2 adults with diverging levels of education, although the data allow this type of analysis.

This study leverages the dataset from Deygers (Reference Deygers2023) in order to investigate which word-level variables predict L2 vocabulary knowledge in adult learners with varying educational backgrounds. Specifically, it examines the differential effects of these variables on L2 vocabulary knowledge among adult learners with primary, secondary or tertiary educational backgrounds in their L1. Vocabulary knowledge was measured using the Peabody Picture Vocabulary Test (PPVT) in Dutch (Dunn et al., Reference Dunn and Dunn2005) at the end of the formal L2 course. By including LESLLA learners in the study, this research contributes to a more comprehensive understanding of vocabulary acquisition in L2 learners and offers practical implications for language instruction. Moreover, by analyzing word-level factors, the study provides a more granular understanding of why certain words might be more challenging for adult L2 learners with different levels of schooling in their L1.

1. Literature review

There is general agreement that second language (L2) learners need to have a relatively extensive vocabulary knowledge in order to be able to read (e.g., Hu & Nation, Reference Hu and Nation2000; Laufer & Ravenhorst-Kalovski, Reference Laufer and Ravenhorst-Kalovski2010; Nation, Reference Nation2006; Zhang & Zhang, Reference Zhang and Zhang2022) or listen for comprehension (van Zeeland & Schmitt, Reference van Zeeland and Schmitt2013; Zhang & Zhang, Reference Zhang and Zhang2022). Estimates for English language learners suggest that understanding spoken language requires knowledge of 3000 word families (van Zeeland & Schmitt, Reference van Zeeland and Schmitt2013), while comprehending written text demands familiarity with 8000 to 9000 word families (Nation, Reference Nation2006). Moreover, a deep understanding of these words, particularly the form-meaning mapping, is crucial for effective language use (Nation, 2020, p. 18). Although vocabulary knowledge entails different dimensions (Nation, Reference Nation and Webb2020), in this study we focus on one, namely receptive vocabulary knowledge at the level of meaning recognition, as the ‘spoken word form and the form-meaning connection’ are ‘the first aspects that would be learned for most words’ (Nation, Reference Nation and Webb2020, p. 25). This means that we define vocabulary knowledge as ‘passive knowledge’ and ‘the ability to recognize the meaning [of a word] in a set of options’ (Laufer et al., Reference Laufer, Elder, Hill and Congdon2004, p. 206; Schmitt, Reference Schmitt2014).

Research has suggested that certain words require more effort for L2 learners to acquire – a phenomenon often referred to as a word’s ‘learning burden’ (Nation, Reference Nation and Webb2020). In the following sections, we delve into word-level variables that have been identified as predictors of the ease or difficulty in learning words.

1.1. The impact of word characteristics on L2 vocabulary knowledge

Research has identified several word-related factors that predict the effort needed to learn a word (Peters, Reference Peters and Webb2020). In this study, we focus on word use characteristics such as frequency, word meaning characteristics (e.g., concreteness), word form characteristics (e.g., word length, part of speech) and interlanguage effects (e.g., cross-linguistic similarity).

Although more factors (e.g., polysemy, L1 frequency) have been identified to date (see Peters, Reference Peters and Webb2020, for a recent overview), we discuss only the factors that are most relevant in the context of receptive vocabulary learning in beginner adult migrant L2 learners with diverging educational and L1 backgrounds.

Word frequency – as inferred from linguistic corpora – is perhaps the most-researched word-level variable in vocabulary research. There is general consensus that vocabulary knowledge is built primarily on word occurrence statistics (Durrant et al., Reference Durrant, Siyanova-Chanturia, Kremmel and Sonbul2022). To learn a word, learners typically need sufficient encounters with that word (Pellicer-Sánchez, Reference Pellicer-Sánchez2016). As such, the more frequent a word is (i.e., the more commonly used it is), the more likely it is that learners will encounter it more frequently. These words are thus more likely to be acquired quickly (Durrant et al., Reference Durrant, Siyanova-Chanturia, Kremmel and Sonbul2022) and to be known to learners (Crossley et al., Reference Crossley, Kyle and Salsbury2016). This word frequency effect may be even more pronounced for L2 words than it is for L1 words (Durrant et al., Reference Durrant, Siyanova-Chanturia, Kremmel and Sonbul2022) and has been found in university-educated learners (e.g., Crossley et al., Reference Crossley, Kyle and Salsbury2016), primary school students (e.g., Puimège & Peters, Reference Puimège and Peters2019a) and preliterate children (e.g., Verhagen et al., Reference Verhagen, Van Stiphout and Blom2022). It is important to note, however, that although many studies have confirmed the impact of word frequency on vocabulary knowledge, its role has been nuanced in recent years. Schmitt et al. (Reference Schmitt, Dunn, O’Sullivan, Anthony and Kremmel2021), for instance, found very low correlations between frequency rankings and knowledge rankings of individual words, albeit only for form recall and not for form or meaning recognition. De Wilde’s study (2023), too, revealed that word frequency was not a significant predictor of English vocabulary knowledge among Dutch-speaking primary school pupils. She hypothesized that the high prevalence of English-Dutch cognates may mitigate frequency effects (see below). Additionally, the frequency effect seems to be much stronger for words that are acquired through written input (i.e., reading) compared to spoken input (i.e., listening) (Peters & Webb, Reference Peters and Webb2018; Vidal, Reference Vidal2011).

Other word-level factors that can predict vocabulary knowledge are semantic or phonological in nature. One of the semantic factors affecting L2 vocabulary knowledge is word concreteness (Laufer, Reference Laufer, Schmitt and McCarthy1997). Brysbaert et al. (Reference Brysbaert, Stevens, De Deyne, Voorspoels and Storms2014, p. 5) define concreteness as ‘the degree to which a concept denoted by a word refers to a perceptible entity’. The main theory guiding this principle is the dual coding theory (Paivio, 2013, in Brysbaert et al., Reference Brysbaert, Stevens, De Deyne, Voorspoels and Storms2014). It is assumed that upon hearing a concrete word, such as ‘apple’, perceptual memory of the object that the word refers to, is activated parallel to the word-meaning mapping. Because of dual coding, concrete words are considered easier to remember and to learn than more abstract words. Various empirical studies with both university-educated and primary school students appear to confirm the dual coding hypothesis (e.g., Brysbaert et al., Reference Brysbaert, Stevens, De Deyne, Voorspoels and Storms2014; Keuleers, Brysbaert & New, Reference Keuleers, Brysbaert and New2010; Puimège & Peters, Reference Puimège and Peters2019a; Verhagen et al., Reference Verhagen, Van Stiphout and Blom2022).

Another factor that influences vocabulary knowledge is word length. Typically, longer words take longer to articulate, which results in a disadvantage in phonological rehearsal and thus storage of the novel word (Nishiyama, Reference Nishiyama2020). In general, word length seems to impact learning, with learners being typically more likely to know more short (i.e., fewer phonemes, or letters) rather than long words (e.g., Ellis & Beaton, Reference Ellis and Beaton1993; Willis & Ohashi, Reference Willis and Ohashi2012). However, not all studies confirm these findings, suggesting a modality effect or a differential effect of word length on different aspects of vocabulary knowledge. In an incidental vocabulary learning experiment from multimodal input with young EFL learners and form recall as outcome measure, Puimège & Peters (Reference Puimège and Peters2019b) found that longer words were more likely to be acquired than shorter words. They argued that the salience of longer words in spoken input could explain this result (Crossley et al., Reference Crossley, Kyle and Salsbury2016). Similar to the effect of word frequency (cf. Peters & Webb, Reference Peters and Webb2018), there might also be a modality effect on the relationship between word length and vocabulary knowledge. Additionally, Barclay & Pellicer-Sánchez (Reference Barclay and Pellicer-Sánchez2021) investigated the impact of L2 word length on learning burden among 48 EFL university students with different L1’s. Their findings indicated that learners encountered greater difficulty in recalling the form of longer words compared to shorter ones, although the effect of word length on form recognition was less prominent. Notably, their study did not assess meaning recognition.

Part of speech (i.e., a word’s grammatical category or word class) (PoS) is also mentioned as a factor that influences L2 vocabulary knowledge, but empirical evidence is relatively thin (Durrant et al., Reference Durrant, Siyanova-Chanturia, Kremmel and Sonbul2022; Peters, Reference Peters and Webb2020). Typically, young children acquire nouns before adjectives, verbs and adverbs (Gentner, Reference Gentner1982; Goodman et al., Reference Goodman, Dale and Li2008; Crossley et al., Reference Crossley, Kyle and Salsbury2016). This trend is especially pronounced in Indo-European languages, as recent studies indicate that languages in which verbs are more salient, like Turkish, Korean, Mandarin or Tseltal, children show a verb rather than a noun bias (e.g., Casillas et al., Reference Casillas, Foushee, Méndez Girón, Polian and Brown2024; Frank et al., Reference Frank, Braginsky, Yurovsky and Marchman2021; Setoh et al., Reference Setoh, Cheng, Bornstein and Esposito2021 Yee, Reference Yee2020). Supporting this pattern, both van Zeeland & Schmitt (Reference van Zeeland and Schmitt2013) and Reynolds et al. (Reference Reynolds, Wu, Liu, Kuo and Yeh2015) found that nouns are more easily acquired than verbs in English as L2. Puimège and Peters (Reference Puimège and Peters2019a) explained this effect by the fact that verbs can occur in many more different forms than nouns, which may hamper acquisition. Additionally, nouns typically refer to more concrete entities, which may also facilitate their acquisition (cf. dual coding principle, see above; Puimège & Peters; Reference Puimège and Peters2019a). However, a recent study that examined the acquisition of pseudowords and German words by English speakers, while manipulating both concreteness and grammatical class, found no evidence to support the latter claim (Martin & Tokowicz, Reference Martin and Tokowicz2020). This suggests a more distinct separation between the effects of concreteness and grammatical class.

A final word-level characteristic that is not necessarily learner-independent, but which has also been found to affect word processing and L2 acquisition is cross-linguistic similarity (e.g., Schepens et al., Reference Schepens, van der Slik and van Hout2016). Since both L1 and L2 words are probably stored into the same mental lexicon (Durrant et al., Reference Durrant, Siyanova-Chanturia, Kremmel and Sonbul2022), cross-linguistic influence may affect vocabulary knowledge. Cognates are a well-known representation of such similarities (cf. the cognate facilitation effect, Lemhöfer, Dijkstra & Michel, Reference Lemhöfer, Dijkstra and Michel2004). In this study, we consider a word a cognate when it is similar in terms of orthography and pronunciation in both the L1 and L2, following the definition by Zhang & Zhang (Reference Zhang and Zhang2022). Several studies with primary school students of English as a foreign language have shown that overlap, or even partial overlap between L1 and L2 words, benefits vocabulary learning (e.g., De Wilde et al., Reference De Wilde, Brysbaert and Eyckmans2020; Puimège & Peters, Reference Puimège and Peters2019a). The cognate effect is typically larger for knowledge at the level of meaning recall compared to word recognition and in spoken texts compared to written texts (Peters & Webb, Reference Peters and Webb2018; Peters, Reference Peters and Webb2020).

It is important to note that all studies reviewed thus far have primarily focused on either university-educated students (e.g., Barclay & Pellicer-Sánchez, Reference Barclay and Pellicer-Sánchez2021; Crossley et al., Reference Crossley, Kyle and Salsbury2016; Martin & Tokowicz, Reference Martin and Tokowicz2020; Peters & Webb, Reference Peters and Webb2018; Reynolds et al., Reference Reynolds, Wu, Liu, Kuo and Yeh2015; Willis & Ohashi, Reference Willis and Ohashi2012; van Zeeland & Schmitt, Reference van Zeeland and Schmitt2013) or primary school students (De Wilde et al., Reference De Wilde, Brysbaert and Eyckmans2020; De Wilde, Reference De Wilde2023; Puimège & Peters, Reference Puimège and Peters2019a). Consequently, it remains uncertain whether these findings can be generalized to adult L2 learners with no or less extensive schooling in their L1.

1.2. L2 vocabulary learning and the role of educational background

Educational background, defined as the level of formal schooling an L2 learner has received in their L1 (Tarone, Reference Tarone2010), is inherently linked to literacy skills, given that schooling is a literate activity (Huettig & Mishra, Reference Huettig and Mishra2014; UNESCO, 2022). Despite the sparse research on the influence of educational background and L1 literacy on L2 vocabulary acquisition, emerging studies underscore the significant impact these factors have. Deygers & Vanbuel (Reference Deygers and Vanbuel2022) revealed that learners with limited education, or LESLLA learners, exhibit markedly lower receptive vocabulary in L2 compared to their more educated peers, even when accounting for other variables such as L1, age, or out-of-school exposure.

As has been argued, this discrepancy is related to the lower metalinguistic abilities of LESLLA learners (Kurvers, Reference Kurvers2015). In order to learn words, learners rely on the phonological loop, a working memory component which helps to decode, store and maintain novel words (Baddeley et al., Reference Baddeley, Logie, Nimmo-Smith and Brereton1985). Yet, the capacity to utilize phonological information effectively when encountering new spoken words varies significantly across individuals with different levels of schooling (Baddeley et al., Reference Baddeley, Logie, Nimmo-Smith and Brereton1985; Demoulin & Kolinsky, Reference Demoulin and Kolinsky2016). Literacy fosters the development of explicit visual language representations, enhancing word recognition and the connection between word forms and their meanings (Kolinsky, Reference Kolinsky, Pollatsek and Treiman2015). This advancement in visual representation aids not only in written language processing but also improves oral language processing (Morais, Alegria & Content, Reference Morais, Castro, Scliar-Cabral, Kolinsky and Content1987; Huettig & Mishra, Reference Huettig and Mishra2014). Schooling and literacy, rather than age, are pivotal in developing the ability to segment words from spoken language, as seen in emergent literate children. Initially processing language in larger units, children begin to discern individual words as their literacy develops (Justino & Kolinsky, Reference Justino and Kolinsky2023; Havron & Arnon, Reference Havron and Arnon2017). Similar patterns have been found in non- and low-literate adults. For example, in a visual world eye-tracking experiment, Huettig, Singh and Mishra (Reference Huettig, Singh and Mishra2011) explored the extent to which adults with high (i.e., 15 years on average) and low levels (i.e., 2 years) of schooling relied on phonological or semantic information to map words onto objects. The study revealed that individuals with higher levels of education and literacy exhibit greater proficiency in utilizing phonological information for word-object mapping, whereas lower-educated adults struggle to employ this information effectively and primarily relied on semantic cues in the input. Moreover, Kurvers’ (Reference Kurvers2015) comparative research on the metalinguistic skills of non-literate L2 learners, L2 low-educated literates and young children illuminated larger differences between unschooled and reading adult L2 learners than in unschooled adult L2 learners and preliterate L1 children. In particular, non-literate L2 learners appeared to have difficulty judging word length, syllogisms and segment sentences into isolated words and words into phonemes.

Furthermore, research indicates that non-literate and low-literate adults face challenges not only in processing spoken language but also in storing and retrieving linguistic information. A series of elicited imitation and lexical decision experiments indicated that L1 adults with reduced schooling face difficulties with phonological tasks, such as repeating pseudowords, compared to tasks involving familiar words, where semantic knowledge can offset phonological limitations (Kosmidis et al., Reference Kosmidis, Folia, Lahou and Kiosseoglou2004, Reference Kosmidis, Tsapkini and Folia2006). Similarly, Deygers (Reference Deygers2020) found that LESLLA learners struggle with repeating phonological structures or pseudowords, whereas they performed similarly to university-educated learners on tasks that required the repetition of existing words. In summary, these findings illustrate the profound influence of formal education and literacy on metalinguistic abilities and the capacity of the phonological loop, which are crucial for learning new words (Godfroid et al., Reference Godfroid, Boers and Housen2013; Nation, Reference Nation2001).

1.3. Differential effects of word-level variables

The reviewed studies in 1.1 suggest that various word-level factors significantly influence the learnability of words, thus serving as reliable predictors of vocabulary knowledge in L2 learners. However, these word-based effects have been shown to vary in magnitude depending on learner-level variables. Indeed, individual differences in these effects (i.e., word frequency, cognateness, concreteness, PoS) have been reported in several studies.

Puimège & Peters (Reference Puimège and Peters2019a) investigated the effects of age, exposure to L2 output outside the classroom and word features on L2 receptive vocabulary knowledge in 10–12-year-old Dutch learners of English as an L2, using the Peabody Picture Vocabulary Test (PPVT). They discovered that both the frequency effect and the effect of cognateness increased with age, suggesting that L2 vocabulary knowledge in terms of meaning recognition is more influenced by these factors in older learners. These findings were attributed to the expanded vocabulary size of the learners, who were still beginning L2 learners at the time of data collection. Consistent with psycholinguistic studies, the word frequency effect first increases with increased language exposure and then decreases again as the learner becomes more proficient in the language (Brysbaert et al., Reference Brysbaert, Stevens, Mandera and Keuleers2016).

Similarly, De Wilde et al. (Reference De Wilde, Brysbaert and Eyckmans2020) explored the relationship between word-related factors, proficiency level and L2 vocabulary knowledge in young EFL learners. They identified a significant and positive interaction effect between word frequency and L2 proficiency, indicating a stronger frequency effect in more proficient L2 learners. Drawing parallels with findings in L1 adults (e.g., Brysbaert et al., Reference Brysbaert, Mandera and Keuleers2018), De Wilde et al. (Reference De Wilde, Brysbaert and Eyckmans2020) proposed that this effect may be attributed to increased exposure to high-frequency words. Additionally, they uncovered a significant interaction effect between word concreteness and L2 proficiency, suggesting that concreteness is a better predictor of vocabulary knowledge in low-proficient L2 learners, aligning with the idea that concrete words are easier to learn. Furthermore, a reversed interaction effect between L2 proficiency and cognates was observed, indicating that less proficient learners tend to rely on their L1 to infer the meaning of L2 words. No differential effects were found of PoS or word length.

What remains less clear is whether these effects vary depending on learners’ educational background in the L1. Given the crucial role of phonological abilities in acquiring novel words (Baddeley et al., Reference Baddeley, Gathercole and Papagno1998) and their apparent enhancement with schooling and literacy acquisition (e.g., Hu, Reference Hu2013; Demoulin & Kolinksy, Reference Demoulin and Kolinsky2016; Kurvers, Reference Kurvers2015), the interaction effect between word-level variables and educational background warrants further examination (Martin & Tokowicz, Reference Martin and Tokowicz2020). To date, differential effects of word-level variables in LESLLA learners and their higher-educated counterparts have not yet been explored. However, research with L1 children and non- or low-literate L1 adults provides intriguing insights into the word length effect and the word frequency effect.

For instance, Morra & Camba (Reference Morra and Camba2009) demonstrated that phonological sensitivity predicts the learning of long nonwords in 10–11-year-old children. Additionally, children with phonological processing difficulties did not exhibit sensitivity to the word length effect in experiments conducted by Palladino & Ferrari (Reference Palladino and Ferrari2008). They conducted experiments with EFL children who encounter challenges in learning foreign languages, all of whom experienced difficulties with phonological processing and phonological memory, indicating a connection between phonological abilities and the word length effect.

The frequency effect appears to be less influenced by learners’ phonological abilities (Brysbaert et al., Reference Brysbaert, Stevens, Mandera and Keuleers2016). However, given that LESLLA learners possess a smaller vocabulary (Deygers & Vanbuel, Reference Deygers and Vanbuel2022) and that the frequency effect is generally weaker in less proficient learners due to reduced exposure, it might operate differently across L2 learners with varying educational backgrounds. For example, it might not impact LESLLA learners as much as it does tertiary-educated L2 learners, who likely have additional exposure to language through written input. Moreover, as suggested by Brysbaert et al. (Reference Brysbaert, Stevens, Mandera and Keuleers2016), meanings of some low-frequency words can be inferred by analyzing the word forms since these words are derivations, inflections or compounds of high-frequency words. Consequently, this may give an advantage to higher-educated L2 learners with stronger L1 literacy skills and more robust metalinguistic abilities.

2. Research questions

The aim of this study is to examine if and to what extent adult L2 learners’ vocabulary knowledge can be predicted by word-related variables and to what extent this effect is differential for L2 learners with diverging educational backgrounds. The main research question guiding this study was: Which factors predict receptive vocabulary knowledge in adult L2 learners, and LESLLA students in particular? This question is addressed in two subquestions:

1. To what extent do word characteristics (i.e., frequency, concreteness, word length, part of speech) predict L2 vocabulary knowledge in adult L2 learners?
2. To what extent are the effects of word characteristics differential according to student educational background, when controlling for other background variables (i.e., age, employment status, CEFR level, L1-L2 distance, length of residence, exposure to Dutch medium TV)?

Based on prior research examining the effects of word-level factors on L2 vocabulary knowledge, we expected to find main effects of all four word-level indicators and differential effects of the word-level variables for adults with diverging educational backgrounds. Since LESLLA learners typically have lower metalinguistic skills (Kurvers, Reference Kurvers2015), and literacy and schooling affect oral processing (Huettig & Mishra, Reference Huettig and Mishra2014), word features like word length, word concreteness or word frequency may well affect L2 receptive vocabulary knowledge differentially in LESLLA learners. Additionally, frequency effects tend to be stronger in reading than listening (e.g., Vidal, Reference Vidal2011), whereas concreteness and L1 effects are stronger in spoken input (Peters & Webb, Reference Peters and Webb2018; Peters, Reference Peters and Webb2020). Since LESLLA learners can mainly rely on spoken L2 input, both frequency and concreteness effects may be differential depending on students’ educational background. Understanding the interplay between educational background, literacy and vocabulary acquisition is crucial for developing effective instructional strategies.

3. Methodology

To answer the research questions, the dataset of Deygers’ cross-sectional correlation study (2023) was used. This dataset is freely available via OSF and consists of vocabulary performances of a large sample of adult learners of Dutch as L2 with diverging educational backgrounds.

3.1. Participants

Participants of this study were adult migrants who were enrolled in a formal Dutch L2 course. Data were collected from 1020 adult L2 learners from 11 schools in Flanders, Belgium. Depending on their educational background, estimated by means of a cognitive ability test which provides an indication of level of schooling (Verschueren et al., Reference Verschueren, Buyse, Germeijs, Janssen, Magez, Van Nijlen, Buysse, Vangoetsenhoven, Arkens and Doumen2011) and L1 literacy skills in a Latin script, which are estimated based on thorough intake conversations by experts in the learners’ L1 (De Niel et al., 2016), participants enroll in a so-called extended, standard and accelerated track. Extended tracks are specifically organized for students who have had limited or interrupted experience with formal schooling in their home country and obtained at most a primary education degree (i.e., LESLLA learners). These learners are emergent functional literates in an alphabetic script, meaning they can likely read and understand isolated words or even sentences, but they struggle to comprehend extended written text in any language they speak. Accelerated tracks typically target students with a tertiary education degree, and standard tracks typically address learners with a degree of lower or upper secondary education (see Deygers & Vanbuel, Reference Deygers and Vanbuel2022, for more details).

Table 1 provides an overview of the background variables of participants in this study, ranked by track type (i.e., educational background and L1 literacy level). Half of the students were taking an A1-level Dutch course at the time of data collection; the other half was enrolled in an A2-level course. Data from 134 students were omitted since no information on their educational background, length of residence or employment was available. Additionally, data from 123 learners who did not pass the level of the Dutch L2 course were removed in order to control for L2 proficiency.

Table 1. Descriptives of the participants by educational background

3.2. Measures

Participants’ vocabulary knowledge in Dutch was measured by means of the Peabody Picture Vocabulary Test in Dutch (PPVT-NL, Dunn & Dunn, Reference Dunn and Dunn2005). The PPVT-NL measures students’ receptive knowledge at the level of meaning recognition by means of recorded single-word items with multiple-choice questions. Participants hear a word and select one of four drawings that best matches the prompt. Sets 1 through 9 were administered, resulting in a total number of 108 items per participant. Four words were excluded from the analyses because they were multiword expressions. For these words, no information on the word-level parameters was available in the corpora from which we subtracted this information. Cronbach’s alpha for the remaining 104 items was .86.

Student background information was gathered by means of a brief written questionnaire with questions formatted in Dutch, French, English, Spanish, Russian, Polish, Turkish, Pashto, Arabic and Mandarin. In the extended tracks, all questionnaires were administered one-on-one to avoid difficulties with reading and interpretation of the questions. Participants were asked about their L1(s) and home languages, their age, gender, length of residence in Belgium, exposure to Dutch tv and radio, educational background and employment.

3.3. Procedure

Students completed the test using paper and pencil. The test was administered during regular classes in 2017 (Deygers, Reference Deygers2023; Deygers & Vanbuel, Reference Deygers and Vanbuel2022). Given the scale of this study, the test was administered in a group rather than individually.

The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008. Participants were told about the research goals and gave oral informed consent, a procedure that was approved by the Faculty’s Ethical Committee. For more details about the test administration, see Deygers & Vanbuel (Reference Deygers and Vanbuel2022).

3.3. Independent variables

This study operationalizes four word-level measures and eight student-level measures (i.e., educational background and control variables). Cross-linguistic similarity was not operationalized at the word level since the students in the dataset had 69 different L1s. Instead, it was operationalized as a student-level variable. At the word level, we included:

- Frequency: For each word, we included a frequency score which we obtained from the Subtlex corpus (Keuleers et al., Reference Keuleers, Brysbaert and New2010). This corpus is based on subtitles in movies and series and is thus based on oral input, which is similar to the test format that we used to measure vocabulary knowledge in this study. Frequency scores were included as Zipf scores, which range from 1 (very low-frequency words) to 6 (very high-frequency content words). The frequency scores are log-transformed with base 2 (i.e., frequency per billion words) (Mean = 3.75, SD = .85, min = 1.36, max = 5.90).
- Part of speech (PoS): For each word, we included whether it was a verb, adjective or noun. Nouns made up the largest part of the words in the test (N = 70, 67.31%), followed by verbs (N = 24, 23.08%) and adjectives (N = 10, 9.62%).
- Word length: For each word, word length was measured by the number of letters and computed automatically via T-scan (Pander Maat et al., Reference Pander Maat, Kraf, van den Bosch, van Gompel, Kleijn, Sanders and van der Sloot2014). As one reviewer pointed out, operationalizing word length by the number of sounds might be more appropriate given the oral nature of the vocabulary test. However, we chose to use the number of graphemes over phonemes to maintain consistency with similar studies (e.g., De Wilde et al., Reference De Wilde, Brysbaert and Eyckmans2020). Additionally, the correlation between the number of letters and the number of sounds in this study was very high (r = .92, p < .001), indicating that the two measures are closely aligned. On average, the words in our study consisted of 6.66 letters (SD = 2.29) and 5.90 (SD = 2.22) sounds, with a minimum length of 3 letters and 2 sounds and a maximum length of 13 letters and 11 sounds.
- Concreteness: To determine the concreteness of the items, we used Brysbaert et al.’s (Reference Brysbaert, Stevens, De Deyne, Voorspoels and Storms2014) concreteness values. These values are retrieved from a survey in 75 university students who were mother tongue speakers of Dutch, who rated a total of 6000 lemma for concreteness from 1 (abstract) to 5 (concrete). The mean score for concreteness for the words in our dataset is 4.17 (SD = .77), indicating that they are relatively concrete.

Word-level variables showed slight to moderate correlations. Specifically, word length and frequency exhibited a moderate negative correlation (r = −.420, p < .001), while concreteness and frequency were positively but weakly correlated (r = .095, p < .001). Additionally, word length and concreteness were slightly negatively correlated (r = −.260, p < .001). Although these correlations were statistically significant, none surpassed a threshold of .8, indicating an absence of overall strong correlations that could pose a risk of multicollinearity.

PoS did not show a significant relationship with word frequency (F(2) = .053, p > .05, η² = .001) or word length (F(3) = 2.057, p > .05, η² = .06) either. However, there were significant and large differences observed among words from various grammatical classes in terms of concreteness (F(2) = 46.132, p < .001, η² = .48). These differences will be taken into account in the analyses.

Information on students’ educational background and other background variables were gathered by means of a questionnaire and cross-checked with school data and included the following:

- Educational background: For each student, a categorical variable was included referring to their educational background (i.e., maximal educational attainment). 21.1% of the students were LESLLA learners (that is no degree or degree of primary education alone), 43.1% had a degree of secondary education and 35.8% had a degree of tertiary education).
- CEFR level: 49.4% (N = 377) of the participants were enrolled in an A1-level track, 50.6% (N = 386) in an A2-level track.
- Job: a binary variable was included for employment. 35.6% (N = 272) of the students in the dataset were employed at the time of testing, since out of school exposure to L2 input may affect L2 vocabulary knowledge (e.g., Puimège & Peters, Reference Puimège and Peters2019a).
- L1–L2 distance: following Jeon and Yamashita (Reference Jeon and Yamashita2014), L1–L2 distance was measured by means of two indices, L1 Indo-European family and L1 Latin script:
- ○ L1 Indo-European: a dichotomous variable for L1 was included. 52.7% (N = 402) of the students in our dataset had an Indo-European L1.
- ○ L1 Latin writing system: a dichotomous variable for L1 writing system was also included. Research has shown that familiarity with the target script facilitates L2 word learning (see Shepperd, Reference Shepperd2024, for an overview). 44.8% (N = 342) of the students used an L1 with a Latin writing system. The correlation between Indo-European L1 and Latin writing system was low (Cramer’s V = .252, p < .001).
- Time in Belgium: A continuous variable was included for the time period the student had lived in Belgium at the time of data collection. On average, participants had been in Belgium for 4.64 years (SD = 5.18) at the time of testing.
- Age: A continuous variable indicating students’ age at the time of testing was included as well, since age differed significantly across educational background (F(2) = 6.793, p < .001) and the correlation with time in Belgium was low (Pearson’s r = .33, p < .001). On average, students were 34 (SD = 10) years old. The youngest student in the dataset was 14 years old, the oldest 68.
- Exposure to Dutch-medium TV: A binary variable was included for media usage or exposure to Dutch outside of the classroom, since previous studies indicated that out-of-school exposure can be a strong predictor of L2 vocabulary knowledge (e.g., Puimège & Peters, Reference Puimège and Peters2019a). On average, 65.5% of the participants indicated that they watched television in Dutch.

3.5. Analysis

All participants were administered the same 108 items of the Dutch PPVT. Responses were scored dichotomously: a correct response received a score of 1, an incorrect response a score of 0. Items that were skipped were also scored as 0. To examine the impact of word- and student-level variables on vocabulary knowledge, logistic regression analyses were conducted. The dependent variable in this study was binary, i.e., whether a word is known. Since our data are hierarchically structured (i.e., words within students), we used two-level models for the analyses. This allows us to include random intercepts for words and students, and thus to take into account differences between words and individual learners (Hox, Reference Hox2010). Continuous variables (i.e., age, time in Belgium, frequency, word length, concreteness) were all centered around the mean.

Models were computed stepwise, relying on the Akaike Information Criterion (AIC) and deviance to find the model that best fitted the data. First, empty models only containing random effects for ‘word’ and ‘student’ were included in the models. Intraclass correlation coefficient (ICC) for word was .865 and .135 for students, meaning that the probability of knowing a word varied between 87% across words and 13% across students. As such, a two-level logistic regression model was suited to analyze our data.

Next, word-level variables ‘frequency’, ‘concreteness’, ‘part of speech’ and ‘word length’ were added to the model to answer RQ1. All variables were centered around the mean. The model improved significantly when ‘frequency’, ‘concreteness’ and ‘word length’ and ‘PoS’, were added to the model. Yet, since there was a significant relationship between PoS and concreteness (F(2) = 42085.697, p < .001) and since the number of adjectives in the dataset was limited, PoS was omitted from further analyses. Table S1 in the Supplementary Materials Online provides an overview of the model fit comparison.

As a third step, ‘educational background’ was added to the model, together with other relevant student-level variables (i.e., level, age, length or residence, L1, employment, exposure to Dutch). Only those variables that significantly contributed to the model, were retained in order to avoid overfitting. The table S2 with an overview of model comparisons can be found in the Supplementary Materials.

In a fourth step, interaction effects were computed between word-level variables and educational background to examine a differential impact of these variables for students with diverging educational backgrounds (RQ2). Data were analyzed in R Version 4.2.2 (R Core Team, 2022) using the ‘glmer’ function from the package ‘lme4’ (Bates et al., Reference Bates, Maechler, Bolker and Walker2015).

4. Results

4.1. To what extent do word characteristics predict L2 vocabulary knowledge in adult L2 learners?

First, we inspected the aggregated correct responses over the words by educational background category. These descriptives show that the average L2 vocabulary knowledge as measured by the PPVT-III-NL increases with educational background. LESLLA learners have the lowest mean scores (M = 53.89, SD = 10.07, min = 26, max = 77), followed by students with a secondary education degree (M = 59.93, SD = 10.82, min = 6, max = 87) and tertiary education degree with the highest mean scores (M = 64.82, SD = 11.74, min = 27, max = 96). The maximum score is 104 since we included 104 items in the analyses.

Logistic regression models with random effects for word and student indicated that the following word-level variables significantly predict vocabulary knowledge: frequency, concreteness and word length (Table 2). Word frequency has the largest odds ratio, indicating that the probability of knowing a more frequent word is more than three times higher than a less frequent word. Similar but smaller effects are found for concreteness and word length.

Table 2. Parameter estimates of word-level factors

4.2. To what extent are the effects of word characteristics differential according to student educational background?

To address the second research question, we first added student-level variables to a two-level logistic regression model with item score as dependent variable, educational background as independent variable and other student characteristics (i.e., CEFR level, age, L1, employment status, length of residence, exposure to Dutch tv) as covariates. The results (Table 3) show that educational background significantly predicts word knowledge, on top of the following word-level variables: CEFR level, L1 and length of residence. Students with a degree of secondary education have a higher probability of knowing a word than LESLLA students. Students with a degree of tertiary education have an even higher probability than LESLLA students and students with a degree of secondary education (b = 0.25, SE = 0.06, z-value = 4.72, p < .001, Odds ratio = 1.29). The same is true for students in A2-level tracks compared to students in A1-level tracks. Students whose first language has a Latin writing system and is thus more comparable to Dutch also have a higher probability of knowing a target word in Dutch than students with a different L1 writing system. Additionally, employment status and length of residence also significantly predict the probability of knowing a word in the L2: adult learners with a job, and who have been in Belgium for a longer period, have a higher probability of knowing a word than adults who are unemployed and who arrived more recently in Belgium. Age, in contrast, does not predict L2 vocabulary knowledge (Table 3). The R² indicates that the model including both word-level and student-level variables can explain 52% of the variance. In comparison, the marginal R², which indicates the predictive value of the fixed effects, is 17.6%.

Table 3. Parameter estimates student-level variables

Next, interaction effects between educational background and word features were added to the model (Table 4) in order to answer RQ2.

Table 4. Parameter estimates interaction effects

A significant interaction effect was found between word frequency and educational background, but only between LESLLA students and students with a degree of tertiary education (Table 4). Frequency is a good predictor of L2 vocabulary knowledge in LESLLA students, but the effect is even more pronounced in tertiary-educated students. The difference is also significant between students with a degree of secondary and tertiary education (b = .081, SE = 0.032, z-value = 2.55, p < .05, Odds ratio = 1.08). Figure 1 illustrates the interaction effect between educational background and word frequency. It reveals that the gap in the probability of knowing a high- versus low-frequency word gradually widens with increasing educational background. Specifically, the word frequency effect appears more pronounced in university-educated L2 learners (i.e., educational background group 3) than in LESLLA learners (i.e., group 1).

Figure 1. Interaction between educational background and frequency (fitted values) (Note: edugroup 1: LESLLA learners, 2: secondary degree, 3: tertiary degree).

No significant interaction effects were found between educational background and concreteness, meaning that the effect of these word-level predictors did not differentially affect adults’ probability of knowing a word according to their educational background.

The analysis also revealed a significant interaction effect between educational background and word length. Students with a degree of secondary or tertiary education have a significantly higher probability of knowing longer words (i.e., words consisting of more letters) than LESLLA learners: 6% and 9%, respectively. The difference between students with a degree of secondary and tertiary education is also significant (b = .027, SE = 0.011, z-value = 2.575, p < .05), with students with a degree of tertiary education having higher odds (1.03) of knowing a longer word than students with a degree of secondary education. Figure 2 shows this interaction effect. Notably, there is no discernible gap between the probability of knowing long and short words in LESLLA learners (educational background group 1). In contrast, such a gap is evident among university-educated learners (educational background group 3).

Figure 2. Interaction between educational background and word length (fitted values) (Note: edugroup 1: LESLLA learners, 2: secondary degree, 3: tertiary degree).

5. Discussion and conclusion

This study examined the relationship between word characteristics, educational background and receptive vocabulary knowledge in a sample of L2 adult learners with diverging educational backgrounds. Specifically, this study aimed to contribute to previous studies that have shown how word features such as frequency or concreteness level influence the difficulty with which a word is learned (e.g., Puimège & Peters, Reference Puimège and Peters2019a; Willis & Ohashi, Reference Willis and Ohashi2012). Yet, since literacy and schooling impact metalinguistic skills such as phonological awareness (Huettig & Mishra, Reference Huettig and Mishra2014; Kurvers, Reference Kurvers2015), which help to detect and rehearse new words (e.g., Candry et al., Reference Candry, Deconinck and Eyckmans2017), it is likely that these two factors have a different influence on adult learners with different educational backgrounds.

The findings of this study confirm previous research by showing that frequency, concreteness and length are word features that predict whether adult L2 learners know a word or not. Words that are more frequent are more than three times as likely to be known than less frequent words (or an increase of 78%). The probability of knowing a word also increases when a word is more concrete (63%), or longer (54% increase per letter). These findings confirm the frequency and concreteness effects that are found in prior word recognition studies (e.g., Brysbaert et al., Reference Brysbaert, Stevens, De Deyne, Voorspoels and Storms2014; Ferrand et al., Reference Ferrand, Méot, Spinelli, New, Pallier and Bonin2018) and seem to provide evidence for usage-based theories of SLA (Crossley et al., Reference Crossley, Kyle and Salsbury2016) and the dual coding theory (Paivio, 2013). The findings related to word length, however, contrast with studies that found that adult L2 learners typically know more short words than long words (e.g., Reynolds et al., Reference Reynolds, Wu, Liu, Kuo and Yeh2015). The impact of word length on the learnability of a word is not uncontested, however, and might be influenced by input modality, or to other word factors that act as control variables in these studies. In an experiment on incidental vocabulary learning from multimodal input, Puimège & Peters (Reference Puimège and Peters2019b) found that young L2 learners were more likely to know longer words than shorter words. They explained this finding by referring to the salience of longer words in spoken input, which may lead to noticing and learning (Crossley et al., Reference Crossley, Kyle and Salsbury2016). Moreover, studies examining the effects of word length on word learning – including this study – typically control for word characteristics like polysemy and frequency. Reynolds et al. (Reference Reynolds, Wu, Liu, Kuo and Yeh2015), for instance, included both word length, polysemy and frequency as word variables in their statistical model. However, since shorter words were significantly more polysemous (with a correlation of −.250, p < .05 between number of phonemes and level of polysemy), and learners in their sample knew significantly more high-frequency polysemous words, the effect of word length may have been confounded. Additionally, the effect of word length may behave differently in samples that deviate from the mainstream SLA research population. Both Reynolds et al. (Reference Reynolds, Wu, Liu, Kuo and Yeh2015) and Willis & Ohashi (Reference Willis and Ohashi2012) used university students as their participants.

Findings show that educational background is the strongest student-level predictor of L2 receptive vocabulary knowledge at the level of meaning recognition. An increase in the highest educational degree raises the probability of knowing a word by 57% (in the case of secondary education) or 62% (tertiary education) compared to having no or only limited experience with schooling in the L1, even when controlling for other background variables. This finding is in line with the studies available on the effects of schooling and/or literacy in adult L2 learning, which showed that both student-level characteristics significantly predict L2 proficiency (e.g., Deygers & Vanbuel, Reference Deygers and Vanbuel2022). Previous research by Kurvers (Reference Kurvers, van de Craats, van Hout, van de Craats, Kurvers and van Hout2015) already indicated that LESLLA learners may face greater difficulty in detecting word boundaries. Additionally, eye-tracking studies in L1 adults and highly educated L2 learners show that ‘lexical processing is aided by active and stable connections between the orthographic and phonological representations (phonological processing) of words and between orthographic form and meaning (vocabulary knowledge)’ (Schmidtke & Moro, Reference Schmidtke and Moro2020, p. 283). Perhaps, then, the weaker visual representations of words in the minds of LESLLA learners make it much more challenging for LESLLA learners to recognize words from input and acquire them, compared to highly educated adults.

Apart from educational background, this study showed that other factors such as CEFR level (A2 compared to A1) or L2 proficiency, L1–L2 distance (as measured by Latin script and Indo-European L1), length of residence and employment significantly predict an adult L2 learners’ probability of knowing a word, receptively. The more proficient a learner is in the L2, the more similar their L1 and L2 are and the longer and/or more often they have been exposed to the L2, the higher the probability that they know more L2 words. These findings confirm prior research (e.g., De Wilde et al., Reference De Wilde, Brysbaert and Eyckmans2020) and are consistent with the importance of usage in learning L2 words.

This study also examined the interaction effect between educational background and word-level features. The interaction effects add to the research consensus by showing that the impact of word frequency and word length differs by educational background. Although the effect of word frequency is strong in LESLLA learners, it is stronger in highly educated adults. This is also true for word length: duration of schooling is proportionate to the gap between the probability of knowing longer and shorter words. In fact, in LESLLA students, the effect of word length is absent, meaning that word length is not a sound predictor of vocabulary knowledge in these learners when controlling for word frequency and level of concreteness.

An explanation for the differential word frequency effect is related to learners’ vocabulary size. As argued by Brysbaert et al. (Reference Brysbaert, Mandera and Keuleers2018), the word frequency effect may depend on the exposure level: individuals with more exposure to words show a stronger frequency effect for both high and less frequent words, whereas individuals with less exposure would show a frequency effect only for high-frequency words (see also De Wilde et al., Reference De Wilde, Brysbaert and Eyckmans2020). Since the overall vocabulary size of LESLLA learners was smaller than that of more highly educated L2 learners, the differential word frequency effect is probably a consequence of vocabulary size.

The findings of the differential word length effect may provide support for the literacy hypothesis (Huettig & Mishra, Reference Huettig and Mishra2014; Kurvers, Reference Kurvers2015), which posits that literacy changes how input is processed and learned (Kurvers et al., Reference Kurvers, van de Craats, van Hout, van de Craats, Kurvers and van Hout2015). Our findings show that for higher educated L2 learners, it may be easier to acquire longer words than for LESLLA learners. As a general rule, the acquisition of longer words requires more processing and rehearsing compared to shorter words (e.g., Barclay & Pellicer-Sánchez, Reference Barclay and Pellicer-Sánchez2021). For LESLLA learners, however, acquiring longer words may prove to be a more challenging task at this proficiency level (A1/A2) than for learners who have received more years of formal education. While highly educated L2 learners can rely on their phonological decoding skills to process longer, more difficult words, this might not be the case for LESLLA learners.

In addition, the interaction effect between word length and educational background, as well as between frequency and educational background, might be explained by the differential ways in which learners with diverging backgrounds encounter L2 words. Longer words are typically less frequent, and more present in academic texts, to which LESLLA learners are likely less exposed. In addition, Peters & Webb (Reference Peters and Webb2018) showed that the frequency effect is stronger in written than in spoken input. Perhaps, highly educated L2 learners have (additional) access to L2 vocabulary through written input since they can transfer L1 reading skills to their L2 (i.e., Alderson’s threshold hypothesis, Hulstijn, Reference Hulstijn2015), whereas LESLLA students more heavily rely on spoken input. In sum, these findings indicate that educational background impacts the effects of word-level factors on L2 vocabulary knowledge.

6. Limitations and implications

While this study adds to the knowledge base on vocabulary knowledge, some limitations should be mentioned. Educational background, as operationalized in this study, is a proxy for other, more fine-grained student-level characteristics such as metalinguistic awareness or L1 literacy skills (Kolinsky, Reference Kolinsky, Pollatsek and Treiman2015). In order to gain a more detailed understanding of the influence of how low-literacy and schooling impact on vocabulary knowledge, future studies might want to look into the effects of metalinguistic skills and L1 literacy directly. In addition, L2 vocabulary knowledge was measured by means of the PPVT (Dunn & Dunn, Reference Dunn and Dunn2005). As for most other receptive vocabulary measures, this test is multiple choice, which allows for guessing (Webb, Reference Webb2008). As such, the test results may be an overestimation of students’ vocabulary knowledge. Additionally, only one vocabulary measure was used to measure the effects of word features and student-level variables, which may overestimate the effects of certain factors on L2 vocabulary knowledge in general (cf. Puimège & Peters, Reference Puimège and Peters2019b). Future research could examine to what extent these findings also apply to other types of vocabulary knowledge, besides form-meaning link at the level of recognition. Future studies may also want to look into the use of other measures to tap into the learning process. While vocabulary knowledge is often used to examine how difficult it is to learn particular words, this type of measure does not provide information on how difficult the learning burden or the learning process itself is or has been (but see Barclay & Pellicer-Sánchez, Reference Barclay and Pellicer-Sánchez2021).

Regardless of these limitations, this study is the first to examine the effects of word-level variables on L2 vocabulary knowledge in adult learners with diverging educational backgrounds, therefore contributing to the generalizability of SLA empirical findings. Additionally, the outcomes of this study can inform L2 teachers in their selection of L2 vocabulary taught in class. By showing that students with diverging educational backgrounds have different patterns of vocabulary knowledge, other words might need to be put into focus during class. For instance, for LESLLA learners, it may be important to focus on longer and less frequent words, whereas highly educated L2 students may benefit from a focus on shorter and less frequent words.

Supplementary material

To view supplementary material for this article, please visit http://doi.org/10.1017/S1366728924000889.

Data Availability

The dataset used in this paper can be accessed via osf.io/qgbsk.

Acknowledgements

I would like to thank Bart Deygers, Benjamin Kremmel, Maribel Montero Perez, Vanessa De Wilde and two anonymous reviewers for their thoughtful comments on earlier versions of this manuscript. This research was funded by FWO Grant Number 12W6622N.

Competing interest

The author(s) declare none.

Footnotes

This article has earned badges for transparent research practices: Open Data. For details, see the Data Availability Statement.

References

Andringa, S., & Godfroid, A. (2020). Sampling bias and the problem of generalizability in applied linguistics. Annual Review of Applied Linguistics, 40, 134–142. https://doi.org/10.1017/S0267190520000033CrossRef Google Scholar

Barclay, S., & Pellicer-Sánchez, A. (2021). Exploring the learning burden and decay of foreign language vocabulary knowledge. The effect of part of speech and word length. ITL – International Journal of Applied Linguistics, 172(2), 259–289. https://doi.org/10.1075/itl.20011.barCrossRef Google Scholar

Bates, D., Maechler, M., Bolker, B., Walker, S. (2015). Fitting linear mixed-effects models using LME4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01CrossRef Google Scholar

Baddeley, A., Gathercole, S., & Papagno, C. (1998). The phonological loop as a language learning device. Psychological Review, 105(1), 158–173. https://doi.org/10.1037/0033-295x.105.1.158CrossRef Google Scholar PubMed

Baddeley, A., Logie, R., Nimmo-Smith, I., & Brereton, N. (1985). Components of fluent reading. Journal of Memory and Language, 24(1), 119–131. https://doi.org/10.1016/0749-596x(85)90019-1CrossRef Google Scholar

Brysbaert, M., Stevens, M., De Deyne, S., Voorspoels, W., & Storms, G. (2014). Norms of age of acquisition and concreteness for 30,000 Dutch words. Acta Psychologica, 150, 80–84. https://doi.org/10.1016/j.actpsy.2014.04.010CrossRef Google Scholar PubMed

Brysbaert, M., Stevens, M., Mandera, P., & Keuleers, E. (2016). The impact of word prevalence on lexical decision times: Evidence from the Dutch Lexicon Project 2. Journal of Experimental Psychology: Human Perception and Performance, 42(3), 441‑458. https://doi.org/10.1037/xhp0000159Google Scholar PubMed

Brysbaert, M., Mandera, P., & Keuleers, E. (2018). The word frequency effect in word processing: An updated review. Current Directions in Psychological Science, 27(1), 45–50. https://doi.org/10.1177/0963721417727521CrossRef Google Scholar

Candry, S., Deconinck, J., & Eyckmans, J. (2017). Metalinguistic awareness in L2 vocabulary acquisition: Which factors influence learners’ motivations of form-meaning connections? Language Awareness, 26(3), 226–243. https://doi.org/10.1080/09658416.2017.1400040CrossRef Google Scholar

Casillas, M., Foushee, R., Méndez Girón, J., Polian, G., & Brown, P. (2024). Little evidence for a noun bias in Tseltal spontaneous speech. First Language (online first). https://doi.org/10.1177/01427237231216571CrossRef Google Scholar

Crossley, S. A., Kyle, K., & Salsbury, T. (2016). A usage-based investigation of L2 lexical acquisition: The role of input and output. The Modern Language Journal, 100, 702–715. https://doi.org/10.1111/modl.12344CrossRef Google Scholar

Carlsen, C. H., & Rocca, L. (2021). Language Test Misuse. Language Assessment Quarterly, 18(5), 477–491. https://doi.org/10.1080/15434303.2021.1947288CrossRef Google Scholar

De Wilde, V., Brysbaert, M., & Eyckmans, J. (2020). Learning english through out-of-school exposure: How do word-related variables and proficiency influence receptive vocabulary learning? Language Learning, 70(2), 349–381. https://doi.org/10.1111/lang.12380CrossRef Google Scholar

Demoulin, C., Kolinsky, R. (2016). Does learning to read shape verbal working memory?. Psychonomic Bulletin & Review, 23, 703–722. https://doi-org/10.3758/s13423-015-0956-7CrossRef Google Scholar PubMed

Deygers, B. (2023, March 12). Educational background and receptive vocabulary [Dataset]. osf.io/qgbskGoogle Scholar

Deygers, B. (2020). Elicited imitation: A test for all learners?: Examining the EI performance of learners with diverging educational backgrounds. Studies in Second Language Acquisition, 42(5), 933–957.CrossRef Google Scholar

Deygers, B., & Vanbuel, M. (2022). Gauging the impact of literacy and educational background on receptive vocabulary test scores. Language Testing, 39(2), 191–211. https://doi.org/10.1177/02655322211049097CrossRef Google Scholar

De Wilde, V. (2023). The auditory picture vocabulary test for English L2: A spoken receptive meaning-recognition test intended for Dutch-speaking L2 learners of English. Language Teaching Research, 1–31 (Online First). https://doi.org/10.1177/13621688221147462CrossRef Google Scholar

Dunn, L. M., Dunn, L. M. (2005). Peabody Picture Vocabulary Test-III-NL. Handleiding. Nederlandse versie Liesbeth Schlichting. Harcourt Test Publishers.Google Scholar

Durrant, P., Siyanova-Chanturia, A., Kremmel, B., & Sonbul, S. (2022). Research Methods in Vocabulary Studies (Vol. 2 ). John Benjamins Publishing Company. https://doi.org/10.1075/rmal.2CrossRef Google Scholar

Ellis, N. C., & Beaton, A. (1993). Psycholinguistic determinants of foreign language vocabulary learning. Language Learning, 43, 559–617. https://doi.org/10.1111/j.1467-1770.1993.tb00627.xCrossRef Google Scholar

Frank, M. C., Braginsky, M., Yurovsky, D., Marchman, V. A. (2021). Variability and consistency in early language learning: The Wordbank project. MIT Press.CrossRef Google Scholar

Ferrand, L., Méot, A., Spinelli, E., New, B., Pallier, C., Bonin, P., … & Grainger, J. (2018). MEGALEX: A megastudy of visual and auditory word recognition. Behavior Research Methods, 50, 1285–1307.Google Scholar

Gentner, D. (1982). Why nouns are learned before verbs: Linguistic relativity versus natural partitioning (Technical Report No. 257). Center for the Study of Reading, National Institute of Education. https://eric.ed.gov/?id=ED219724 Google Scholar

Goodman, J. C., Dale, P. S., & Li, P. (2008). Does frequency count? Parental input and the acquisition of vocabulary. Journal of Child Language, 35(3), 515–531. https://doi.org/10.1017/S0305000907008641CrossRef Google Scholar PubMed

Godfroid, A., Boers, F., & Housen, A. (2013). An eye for words: Gauging the Role of Attention in Incidental L2 Vocabulary Acquisition by Means of Eye-Tracking. Studies in Second Language Acquisition, 35(3), 483‑517. https://doi.org/10.1017/s0272263113000119CrossRef Google Scholar

Havron, N., & Arnon, I. (2017). Minding the gaps: Literacy enhances lexical segmentation in children learning to read. Journal of Child Language, 44(6), 1516–1538. https://doi.org/10.1017/S0305000916000623CrossRef Google Scholar PubMed

Hox, J. J. (2010). Multilevel analysis: Techniques and applications (2nd ed.) . Routledge/Taylor & Francis Group.Google Scholar

Hu, H. M. (2013). The Effects of Word Frequency and Contextual Types on Vocabulary Acquisition from Extensive Reading: A Case Study. Journal of Language Teaching and Research, 4(3). https://doi.org/10.4304/jltr.4.3.487-495CrossRef Google Scholar

Hu, M., & Nation, I.S.P. (2000). Unknown vocabulary density and reading comprehension. Reading in A Foreign Language, 23, 403–430.Google Scholar

Huettig, F., & Mishra, R. K. (2014). How literacy acquisition affects the illiterate mind-a critical examination of theories and evidence. Language and Linguistics Compass, 8(10), 401–427. https://doi.org/10.1111/lnc3.12092CrossRef Google Scholar

Hulstijn, J.H. (2015). Language proficiency in native and non-native speakers. Theory and research. John Benjamins Publishing Company. https://doi.org/10.1075/lllt.41CrossRef Google Scholar

Huettig, F., Singh, N., & Mishra, R. K. (2011). Language-Mediated Visual Orienting Behavior in Low and High Literates. Frontiers in Psychology, 2(285). https://doi.org/10.3389/fpsyg.2011.00285CrossRef Google Scholar PubMed

Jeon, E. H., & Yamashita, J. (2014). L2 Reading Comprehension and Its Correlates: A Meta‐Analysis. Language Learning, 64(1), 160–212. https://doi.org/10.1111/lang.12034CrossRef Google Scholar

Justino, J., & Kolinsky, R. (2023). Eye movements during reading in beginning and skilled readers: Impact of reading level or physiological maturation? Acta Psychologica, 236, 103927. https://doi.org/10.1016/j.actpsy.2023.103927CrossRef Google Scholar PubMed

Keuleers, E., Brysbaert, M. & New, B. (2010). SUBTLEX-NL: A new measure for Dutch word frequency based on film subtitles. Behavior Research Methods, 42, 643–650. https://doi.org/10.3758/BRM.42.3.643CrossRef Google Scholar PubMed

Kosmidis, M.H., Folia, V., Lahou, C.H., & Kiosseoglou, G. (2004). Semantic and phonological processing in illiteracy. Journal of the International Neuropsychological Society, 10, 818–827.CrossRef Google Scholar PubMed

Kosmidis, M. H., Tsapkini, K., & Folia, V. (2006). Lexical processing in illiteracy: Effect of literacy or education? Cortex, 42(7), 1021–27. https://doi.org/10.1016/S0010-9452(08)70208-9.CrossRef Google Scholar PubMed

Kurvers, J. (2015). Emerging literacy in adult second-language learners: A synthesis of research findings in the Netherlands. Writing Systems Research, 7(1), 58–78. https://doi.org/10.1080/17586801.2014.943149CrossRef Google Scholar

Kurvers, J., van de Craats, I., & van Hout, R. (2015). Footprints for the future: Cognition, literacy and second language learning by adults. In van de Craats, I., Kurvers, J., & van Hout, R. (Eds.), Adult literacy, second language and cognition (pp. 7–32). CLS.Google Scholar

Kolinsky, R., (2015). How Learning to Read Influences Language and Cognition. In Pollatsek, A., & Treiman, R., (Eds.), The Oxford Handbook of Reading (pp. 377–394). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199324576.013.29CrossRef Google Scholar

Laufer, B. (1997). What’s in a word that makes it hard or easy? Intralexical factors affecting vocabulary acquisition. In Schmitt, N. & McCarthy, M. (Eds.), Vocabulary: Description, acquisition, and pedagogy (pp. 140–155). Cambridge University Press.Google Scholar

Laufer, B., Elder, C., Hill, K., & Congdon, P. (2004). Size and strength: Do we need both to measure vocabulary knowledge? Language Testing, 21(2), 202–226. https://doi.org/10.1191/0265532204lt277oaCrossRef Google Scholar

Laufer, B., & Ravenhorst-Kalovski, G. C. (2010). Lexical threshold revisited: Lexical text coverage, learners’ vocabulary size and reading comprehension. Reading in a Foreign Language, 22(1), 15.Google Scholar

Lemhöfer, K., Dijkstra, T., & Michel, M. (2004). Three languages, one ECHO: Cognate effects in trilingual word recognition. Language and Cognitive Processes, 19 ( 5), 585–611. https://doi.org/10.1080/01690960444000007CrossRef Google Scholar

Martin, K. I., & Tokowicz, N. (2020). The grammatical class effect is separable from the concreteness effect in language learning. Bilingualism: Language and Cognition, 23(3), 554–569. https://doi.org/10.1017/S1366728919000233CrossRef Google Scholar

Morais, J., Castro, S.-L., Scliar-Cabral, L., Kolinsky, R., & Content, A. (1987). The effects of literacy on the recognition of dichotic words. Quarterly Journal of Experimental Psychology, 39 A, 451–465. doi:10.1080/14640748708401798CrossRef Google Scholar PubMed

Morra, S., & Camba, R. (2009). Vocabulary learning in primary school children: Working memory and long-term memory components. Journal of Experimental Child Psychology, 104(2), 156–78.CrossRef Google Scholar PubMed

Nation, I. S.P. (2001). Learning vocabulary in another language (Cambridge Applied Linguistics). Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9781139524759CrossRef Google Scholar

Nation, I.S.P. (2006). How large a vocabulary is needed for reading and listening? Canadian Modern Language Review, 63, 59–81. https://doi.org/10.3138/cmlr.63.1.59CrossRef Google Scholar

Nation, P. (2020). The different aspects of vocabulary knowledge. In Webb, S. (Ed.), The Routledge handbook of vocabulary studies (pp. 15–29). Routledge. https://doi.org/10.4324/9780429291586-2Google Scholar

Nishiyama, R. (2020). Adaptive use of semantic representations and phonological representations in verbal memory maintenance. Journal of Memory and Language, 111, 104084. https://doi.org/10.1016/j.jml.2019.104084CrossRef Google Scholar

Ortega, L. (2005). For what and for whom is our research? The ethical as transformative lens in instructed SLA. Modern Language Journal, 89(3), 427–443. https://doi.org/10.1111/j.1540-4781.2005.00315.xCrossRef Google Scholar

Palladino, P., & Ferrari, M. (2008). Phonological sensitivity and memory in children with a foreign language learning difficulty. Memory (Hove, England), 16(6), 604–625. https://doi.org/10.1080/09658210802083072CrossRef Google Scholar PubMed

Pander Maat, H., Kraf, R., van den Bosch, A., van Gompel, M., Kleijn, S., Sanders, T., & van der Sloot, K. (2014). T-scan: A new tool for analyzing Dutch text. Computational Linguistics in the Netherlands Journal, 4, 53–74.Google Scholar

Pellicer-Sánchez, A. (2016). Incidental L2 vocabulary acquisition from and while reading: An eye-tracking study. Studies in Second Language Acquisition, 38(1), 97–130. https://doi.org/10.1017/S0272263115000224CrossRef Google Scholar

Peters, E. (2020). Factors affecting the learning of single-word items. In Webb, S. (Ed.), The Routledge handbook of vocabulary studies (pp. 125–142). Routledge. https://doi.org/10.4324/9780429291586-9Google Scholar

Peters, E., & Webb, S. (2018). Incidental vocabulary acquisition through viewing L2 television and factors that affect learning. Studies in Second Language Acquisition, 40, 551–577. https://doi.org/10.1017/S0272263117000407CrossRef Google Scholar

Puimège, E., & Peters, E. (2019a). Learners’ english vocabulary knowledge prior to formal instruction: The role of learner-related and word-related variables. Language Learning, 69(4), 943–977. https://doi.org/10.1111/lang.12364CrossRef Google Scholar

Puimège, E., & Peters, E. (2019b). Learning L2 vocabulary from audiovisual input: An exploratory study into incidental learning of single words and formulaic sequences. The Language Learning Journal, 47(4), 424–438, DOI 10.1080/09571736.2019.1638630CrossRef Google Scholar

Reynolds, B. L., Wu, W.-H., Liu, H.-W., Kuo, S.-Y., & Yeh, C.-H. (2015). Towards a model of advanced learners’ vocabulary acquisition: An investigation of l2 vocabulary acquisition and retention by Taiwanese English majors. Applied Linguistics Review, 6(1), 121–144. https://doi.org/10.1515/applirev-2015-0006CrossRef Google Scholar

R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.Google Scholar

Schepens, J. J., van der Slik, F., & van Hout, R. (2016). L1 and L2 distance effects in learning L3 Dutch. Language Learning, 66(1), 224–256. https://doi.org/10.1111/lang.12150CrossRef Google Scholar

Schmitt, N. (2014). Size and depth of vocabulary knowledge: What the research shows. Language Learning, 64(4), 913–951. https://doi.org/10.1111/lang.12077CrossRef Google Scholar

Schmitt, N., Dunn, K., O’Sullivan, B., Anthony, L., & Kremmel, B. (2021). Introducing knowledge-based vocabulary lists (KVL). TESOL Journal, 12(4), e622. https://doi.org/10.1002/tesj.622CrossRef Google Scholar

Setoh, P., Cheng, M., Bornstein, M.H., & Esposito, G. (2021). Contrasting lexical biases in bilingual English–Mandarin speech: Verb-biased mothers, but noun-biased toddlers. Journal of Child Language, 48(6), 1185–1208. https://doi.org/10.1017/S0305000920000720CrossRef Google Scholar PubMed

Shepperd, L. (2024). Cross-scriptal orthographic influence on second language phonology. Languages, 9(210), 1–28. https://doi.org/10.3390/languages9060210CrossRef Google Scholar

Schmidtke, D., & Moro, A. L., (2020). Determinants of Word‐Reading Development in English Learner University Students: A Longitudinal Eye Movement Study. Reading Research Quarterly, 56(4), 819‑854. Portico. https://doi.org/10.1002/rrq.362CrossRef Google Scholar

Tarone, E. (2010). Second language acquisition by low-literate learners: An under-studied population. Language Teaching, 43(1), 75–83. https://doi.org/10.1017/S0261444809005734CrossRef Google Scholar

UNESCO. (2022). Global Education Monitoring Report 2021/2: Non-state Actors in Education: Who Chooses? Who Loses? In Global Education Monitoring Report. United Nations. https://doi.org/10.18356/9789210022279CrossRef Google Scholar

Vágvölgyi, R., Coldea, A., Dresler, T., Schrader, J., & Nuerk, H.-C. (2016). A review about functional illiteracy: Definition, cognitive, linguistic, and numerical aspects. Frontiers in Psychology, 7, 1617. https://doi.org/10.3389/fpsyg.2016.01617CrossRef Google Scholar PubMed

van Zeeland, H., & Schmitt, N. (2013). Lexical coverage in L1 and L2 listening comprehension: The same or different from reading comprehension? Applied Linguistics, 34(4), 457–479. https://doi.org/10.1093/applin/ams074CrossRef Google Scholar

Verhagen, J., Van Stiphout, M., & Blom, E. (2022). Determinants of early lexical acquisition: Effects of word- and child-level factors on Dutch children’s acquisition of words. Journal of Child Language, 49(6), 1193–1213. https://doi.org/10.1017/S0305000921000635CrossRef Google Scholar PubMed

Verschueren, K., Buyse, E., Germeijs, V., Janssen, R., Magez, W., Van Nijlen, D., Buysse, G., Vangoetsenhoven, S., Arkens, T., & Doumen, S. (2011). Evaluatie en aanpassing van de Covaar-II. https://www.esf-agentschap.be/sites/default/files/attachments/articles/eindrapport_eif_project_covaar_ii.pdf Google Scholar

Vidal, K. (2011). A comparison of the effects of reading and listening on incidental vocabulary acquisition. Language Learning, 61, 219–258. https://doi.org/10.1111/j.1467-9922.2010.00593.xCrossRef Google Scholar

Webb, S. (2008). The effects of context on incidental vocabulary learning. Reading in a Foreign Language, 20(2), 232–245.Google Scholar

Webb, S. (2020). Incidental vocabulary learning. In Webb, S. (Ed.), The Routledge handbook of vocabulary studies (pp. 225–239). Routledge.Google Scholar

Willis, M., & Ohashi, Y. (2012). A model of L2 vocabulary learning and retention. The Language Learning Journal, 40(1), 125–137. https://doi.org/10.1080/09571736.2012.658232CrossRef Google Scholar

Yee, S. (2020). Is noun bias universal? Evidence from Chinese and Korean compared with French and English. Studies in the Linguistic Sciences: Illinois Working Papers, 32–44. https://core.ac.uk/download/359141659.pdf Google Scholar

Zhang, S., & Zhang, X. (2022). The relationship between vocabulary knowledge and L2 reading/listening comprehension: A meta-analysis. Language Teaching Research, 26(4), 696–725. https://doi.org/10.1177/1362168820913998CrossRef Google Scholar

Table 1. Descriptives of the participants by educational background

Table 2. Parameter estimates of word-level factors

Table 3. Parameter estimates student-level variables

Table 4. Parameter estimates interaction effects

Figure 1. Interaction between educational background and frequency (fitted values) (Note: edugroup 1: LESLLA learners, 2: secondary degree, 3: tertiary degree).

Figure 2. Interaction between educational background and word length (fitted values) (Note: edugroup 1: LESLLA learners, 2: secondary degree, 3: tertiary degree).

Vanbuel supplementary material

File 138.2 KB

Article contents

Predicting vocabulary knowledge in adult L2 learners: The role of word-level variables across educational backgrounds

Abstract

Keywords

1. Literature review

1.1. The impact of word characteristics on L2 vocabulary knowledge

1.2. L2 vocabulary learning and the role of educational background

1.3. Differential effects of word-level variables

2. Research questions

3. Methodology

3.1. Participants

3.2. Measures

3.3. Procedure

3.3. Independent variables

3.5. Analysis

4. Results

4.1. To what extent do word characteristics predict L2 vocabulary knowledge in adult L2 learners?

4.2. To what extent are the effects of word characteristics differential according to student educational background?

5. Discussion and conclusion

6. Limitations and implications

Supplementary material

Data Availability

Acknowledgements

Competing interest

Footnotes

References

Vanbuel supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests