Introduction
In the area of second language (L2) speech research, many scholars have sought to understand which factors contribute to the process and product of learners’ successful phonological acquisition (Trofimovich, Kennedy & Foote, Reference Trofimovich, Kennedy, Foote, Reed M and Levis2015). A large number of studies have reported that L2 speech outcomes are strongly linked to the quantity and quality of a learner's L2 experience (i.e., more practice is better) (Flege, Reference Flege2016 for overviews), to aptitude (Baker-Smemoe & Haslam, Reference Baker-Smemoe and Haslam2013) and to sociopsychological factors such as motivation (Liu & Huang, Reference Liu and Huang2011). However, few of these studies have justified their selection of IDs using a theoretical model, or have exclusively focused on either the cognitive or sociopsychological aspects of IDs. The current study departed from this trend, and sought to unravel the complexities of classroom-based L2 pronunciation learning from a DST perspective. In the context of 73 college-level Japanese speakers of English, we conducted both cross-sectional and longitudinal investigations on the relative weights of experiential, cognitive, and sociopsychological factors in adult L2 speech learning. In the cross-sectional phase (Study 1), we examined the relationship between participants' experiential, cognitive and sociopsychological profiles and two different aspects of L2 oral proficiency, i.e., comprehensibility (i.e., how difficult it is to understand what the speaker is saying) and accentedness (i.e., how heavily a speaker's speech is affected by his/her native language; Derwing & Munro, Reference Derwing and Munro2013). In the longitudinal phase (Study 2), we tracked the same participants’ L2 comprehensibility and accentedness development, when they received four weeks of explicit pronunciation instruction.
Background
Individual differences in SLA research
Over the past 50 years, much scholarly attention has been given to examining how the process and product of L2 learning is characterized by various contextual, experiential, cognitive and sociopsychological factors. Although existing studies tend to focus on either cognitive or psychological aspects, little attempt has been made to investigate IDs holistically by investigating both at the same time (Serafini, Reference Serafini2017). However, scholars have begun to call for a more integrative approach with which to explore how individual learners with varied profiles of experience, cognition, motivation, and emotion can develop different dimensions of language (e.g., Ortega, Reference Ortega2013). One such framework could be Dynamic Systems Theory (DST). DST is an approach, or a meta-theory (Larsen–Freeman, Reference Larsen–Freeman2013), that consists of a set of principles for exploring the changes in complex systems. The theory holds that such changes are sensitive to initial states, are resource-dependent, non-linear, and exhibit emergent outcomes when systems stabilize at attractor states (e.g., de Bot, Reference de Bot2008). A particular system consists of multiple components, and the interaction between the components characterizes the state of the system (de Bot, Reference de Bot2008). Identifying the operating rules of these components allows for robust interpretations to be made about system behavior. From a DST perspective, learner-external and learner-internal factors can be considered to be components that shape developmental changes in language systems (e.g., Hiver & Al-Hoorie, Reference Hiver and Al-Hoorie2016).
Another integrative approach towards individual differences concerns cognitive psychologists’ account of the human mind, i.e., the trilogy of mind. Under this view, human intellectual functioning consists of motivation, emotion, and cognition (e.g., Matthews & Zeidner, Reference Matthews, Zeidner, Dai and Sternberg2004). Researchers typically categorize learner-internal IDs into these three domains (i.e., cognition, motivation, and emotion), and stress that it is crucial to give them each equal attention (Waninge, Reference Waninge, Dörnyei, MacIntyre and Henry2015). Furthermore, in the context of L2 pronunciation research, Moyer (Reference Moyer2014) has shown that L2 learners who can produce near-nativelike L2 pronunciation often show superior scores on multiple IDs (both cognitive and sociopsychological IDs), suggesting a synergistic effect in the context of L2 pronunciation learning.
Following these lines of thoughts, we propose that L2 pronunciation acquisition can be conceptualized as a multidimensional and complex phenomenon. To detangle its complex mechanisms, the current study took a first step towards exploring how both cognitive, sociopsychological IDs dynamically interact to shape two different dimensions of the L2 pronunciation learning process (comprehensibility vs. accentedness) from multiple angles (cross-sectional vs. longitudinal).
Roles of individual differences in second language pronunciation learning
To date, researchers have extensively examined a range of IDs hypothesized to predict success in L2 pronunciation learning. For example, many studies have explored the role of different cognitive abilities in attaining advanced L2 pronunciation perception and/or production performance. Variables investigated to date have included working memory (e.g., Hu, Ackermann, Martin, Erb, Winkler & Reiterer, Reference Hu, Ackermann, Martin, Erb, Winkler and Reiterer2013), attention control (Darcy, Park & Yang, Reference Darcy, Park and Yang2015), musical aptitude (Li & DeKeyser, Reference Li and DeKeyser2017), domain general auditory processing (Saito, Sun & Tierney, Reference Saito, Sun and Tierney2020), foreign language aptitude (Saito & Hanzawa, Reference Saito and Hanzawa2016) and personality profiles (Hu & Reiterer, Reference Hu, Reiterer, Dogil and Reiterer2009). Other scholars have suggested that social and psychological factors impact learning. For instance, factors such as ethnic group affiliation (Gatbonton & Trofimovich, Reference Gatbonton and Trofimovich2008), contextual attitude (Huensch & Thompson, Reference Huensch and Thompson2017), language awareness (Kennedy & Trofimovich, Reference Kennedy and Trofimovich2010), motivation to learn an L2 (e.g., Nagle, Reference Nagle2018a), and degree of anxiety towards learning an L2 (Baran-Łucarz, Reference Baran-Łucarz and Pawlak2016; Sardegna, Lee & Kusey, Reference Sardegna, Lee and Kusey2014) have been found to affect pronunciation attainment and performance. In what follows, we provide a selective overview of past research evidence on IDs in relation to L2 pronunciation learning in the classroom setting.
Foreign language aptitude
Many scholars have attributed exceptionality in L2 pronunciation to some underlying talent, what researchers have called aptitude (e.g., Muñoz & Singleton, Reference Muñoz and Singleton2007). Foreign language learning aptitude refers to the set of specialized cognitive factors that are thought to play a role in language learning (Li, Reference Li2016). According to Carroll's (Reference Carroll and Glaser1962) influential model, aptitude consists of phonemic coding ability, grammatical sensitivity, inductive learning, and associative memory. To respond to the growing interest in both implicit and explicit learning aptitudes (Suzuki & DeKeyser, Reference Suzuki and DeKeyser2015), several post-MLAT (Carroll & Sapon, Reference Carroll and Sapon1959) batteries have been developed, including the LLAMA (Meara, Reference Meara2005), the CANAL-F test (Grigorenko, Sternberg & Ehrman, Reference Grigorenko, Sternberg and Ehrman2000), and Hi-LAB (Doughty et al., Reference Doughty, Campbell, Mislevy, Bunting, Bowles and Koeth2010). Among these, the LLAMA tests have been widely used in the field of SLA to measure both implicit (sound sequence recognition) and explicit (associative memory, phonemic coding and grammatical inferencing) learning aptitude (Granena, Reference Granena, Grañena and Long2013). Cross-sectional and longitudinal studies of aptitude suggest that (a) different explicit learning aptitudes work on different aspects of L2 speech development, and (b) explicit and implicit aptitudes determine different stages of speech development (Baker-Smemoe & Haslam, Reference Baker-Smemoe and Haslam2013; Hu et al., Reference Hu, Ackermann, Martin, Erb, Winkler and Reiterer2013; Saito & Hanzawa, Reference Saito and Hanzawa2016). Saito and Hanzawa (Reference Saito and Hanzawa2016) reported that Japanese L2 English learners’ aptitude scores (a composite of four sub-tests measured via LLAMA) showed positive correlations with segmental, word stress, and speech rate ratings obtained from native raters. Baker-Smemoe and Haslam (Reference Baker-Smemoe and Haslam2013) examined the relationship between L2 learners’ pronunciation proficiency (operationalized as production accuracy, reduced accentedness, and fluency) and aptitude (as well as motivation and various strategies). They also found that sound discrimination ability (measured via the PLAB) was associated with reduced accentedness, and that higher comprehensibility was predicted by higher motivation and the use of various learning strategies. Similarly, Hu et al. (Reference Hu, Ackermann, Martin, Erb, Winkler and Reiterer2013) found that higher phonemic coding ability predicts better L2 pronunciation performance. More recent work has suggested that (a) phonemic coding ability (measured by the LLAMA E, B) and rote memory contributed to quick improvements in accuracy and fluency; and (b) sound sequence recognition (measured via the LLAMA D) facilitated comprehensibility in the long run by enhancing their accurate production of segmentals (Saito, Suzukida & Sun, Reference Saito, Suzukida and Sun2019). Such evidence indicates that sound sequence recognition may also tap into L2 learners’ implicit learning aptitude.
Motivation
Motivation is believed to play a role in initiating and maintaining learners’ efforts to learn an L2 (e.g., Gardner, Reference Gardner2007). Researchers have found that learners’ motivation, and especially their concerns for native-like L2 pronunciation, is a key predictor of reduced foreign accent (Gonet, Reference Gonet, Sobkowiak and Waniek-Klimczak2006; Moyer, Reference Moyer2014). For example, Gonet's (Reference Gonet, Sobkowiak and Waniek-Klimczak2006) classroom study of Polish English as a Foreign Language (EFL) learners found that motivation was the strongest contributor to L2 pronunciation acquisition.
Recently, Dörnyei's (Reference Dörnyei2005) L2 Motivational Self System has been increasingly used to explore different motivational orientations, learning behaviors, and learning outcomes in the FL classroom setting (Dörnyei & Chan, Reference Dörnyei and Chan2013). The model consists of two components, or self-guides: the Ought-to L2 self (i.e., imposed self-image related to obligation and avoidance) and the Ideal L2 self (idealized self-image of an L2 user). Both components are considered to be closely associated with the extent to which learners are committed to studying, practicing, and using an L2 for an extensive period of time (e.g., Ushioda, Reference Ushioda2016). Furthermore, higher levels of Ideal L2 self have been linked with positive L2 learning outcomes (e.g., Dörnyei & Chan, Reference Dörnyei and Chan2013). In L2 pronunciation research, however, only a handful of studies examined the link between possible selves and L2 speech performance (e.g., Nagle, Reference Nagle2018a; Saito, Dewaele, Abe & In'nami, Reference Saito, Dewaele, Abe and In'nami2018). Saito et al. (Reference Saito, Dewaele, Abe and In'nami2018) found a link between the two self-guides and L2 experience, but also found a positive correlation between higher Ideal L2 self and comprehensibility. Based on these findings, the authors suggested that Ideal L2 self may be a key factor for enhancing information processing, and helping them make the most of the available opportunities of receiving input and producing speech in L2. However, as the available research evidence is limited (e.g., Nagle, Reference Nagle2018a), further research in the EFL setting is required to confirm the robust influence of self-guides on L2 pronunciation learning.
Anxiety
Another factor worthy of attention in L2 pronunciation learning is anxiety. Since Horwitz and colleagues’ development of the Foreign Language Classroom Anxiety Scale (FLCAS; Horwitz, Horwitz & Cope, Reference Horwitz, Horwitz and Cope1986), learners’ anxiety in the classroom has been explored as a predictor of L2 performance (e.g., for a meta-analysis see Teimouri, Goetze & Plonsky, Reference Teimouri, Goetze and Plonsky2019). According to Baran-Łucarz (Reference Baran-Łucarz and Pawlak2016), L2 pronunciation learning engenders a specific form of anxiety due to the perceived discrepancy between a learner's current pronunciation and the level of pronunciation they expect/desire to reach. Moreover, learners’ self-perception of their pronunciation skill or their willingness to accept target-like pronunciation and modify their own pronunciation is believed to result in some changes to their actual behaviors (Baran-Łucarz, Reference Baran-Łucarz and Pawlak2016). Therefore, more recently, scholars have begun to conceptualize an anxiety unique to pronunciation learning, identifying it either as Measure of Pronunciation Anxiety in the FL Classroom) (Baran-Łucarz, Reference Baran-Łucarz and Pawlak2016), or as part of the Learner Attitudes and Motivations for Pronunciation inventory (Sardegna et al., Reference Sardegna, Lee and Kusey2014).
Research in the field of cognitive psychology has suggested that anxiety influences the cognitive, psychological, and behavioral aspects of learning. For instance, high anxiety has been shown to decrease the efficiency of cognitive functioning during task execution, can lead to panic and shakiness, and can result in task avoidance (e.g., Vasa & Pine, Reference Vasa, Pine, Morris and March2004). Because anxiety can hinder one's attention control, it is believed to deteriorate language learners’ ability to receive and process input, and to produce output (Piechurska-Kuciel, Reference Piechurska-Kuciel2008). These negative impacts have been extended to L2 pronunciation learning as well (Baran-Łucarz, 2013). While pronunciation-specific anxiety has been explored in relation to learners’ self-rated proficiency (e.g., Szyszka, Reference Szyszka2011), only a few empirical studies have explored proficiency as rated by others (cf. Saito et al., Reference Saito, Dewaele, Abe and In'nami2018). For example, Saito et al. (Reference Saito, Dewaele, Abe and In'nami2018) found that anxiety, measured via the FLCAS, was significantly correlated with comprehensibility. Their findings not only support the assertion that anxiety is an emotion that is shaped through the accumulations of one's learning experience over time (Dewaele & Dewaele, Reference Dewaele and Dewaele2017), but also shed light on the possible impact of negative emotions on pronunciation learning. However, more studies are needed to fully understand the relationship between anxiety and L2 pronunciation, particularly those which seek to identify how pronunciation-specific anxiety influences L2 pronunciation learning.
Motivation for current study
As reviewed above, previous research has explored various cognitive and sociopsychological IDs as potential predictors of L2 pronunciation learning success. However, there is little crosstalk between the two different groups of ID researchers. In other words, we have yet to know how both cognitive and sociopsychological factors interact to impact different dimensions of L2 acquisition. One exception to this is Serafini (Reference Serafini2017), which adopted a DST framework and took a longitudinal approach towards exploring the dynamic relationships between cognitive and sociopsychological IDs and general L2 proficiency. The study focused on the links between working memory (executive function, and phonological working memory), anxiety, attitude, and motivation of American learners of Spanish in the U.S. The results suggested that roles of IDs differed significantly depending on the timing of data collection (onset vs. endpoint) and learners’ proficiency levels.
In discussing the results, Serafini (Reference Serafini2017) stressed the importance of adopting an integrative perspective in researching IDs in order to accurately represent them as a set of dynamic and complex factors that affect L2 development. To our knowledge, however, no studies have taken such an approach towards investigating the differential impact of cognitive and socio-psychological IDs on L2 pronunciation learning (e.g., Baran-Łucarz, Reference Baran-Łucarz2017 for motivation and anxiety; Baker-Smemoe & Haslam, Reference Baker-Smemoe and Haslam2013 for aptitude and motivation). Therefore, the primary focus of the current study was to understand the complex contributions of cognitive, motivational, and emotional IDs towards two different dimensions of L2 speech acquisition (enhancing comprehensibility vs. reducing foreign accentedness). To capture the dynamic nature of the ID-acquisition link, we designed a two-part study wherein we looked at the role of experience, aptitude, motivation, and emotion in L2 speech learning from both cross-sectional and longitudinal perspectives. In the cross-sectional phase (Study 1), the relationship between students’ initial IDs and L2 pronunciation profiles was examined at the start of data collection. In the longitudinal phase (Study 2), the same participants’ IDs were linked to their speech development during L2 pronunciation training. Following DST researchers' views on learner IDs (i.e., Serafini, Reference Serafini2017), and in keeping with the notion of the trilogy of mind, we focused on foreign language aptitude, motivation, and anxiety as proxies for the cognitive, motivational, and emotional aspects of L2 learners, respectively. Lastly, pronunciation was evaluated multidimensionally in terms of the degree of accentedness and comprehensibility. The research questions were formulated as follows:
1. Study 1: How are the comprehensibility and accentedness aspects of L2 speech differentially associated with speakers’ experience and cognitive, motivational, and emotional ID factors at the onset of the project?
2. Study 2: How is L2 learners’ speech development mediated by their cognitive, motivational, and emotional ID profiles when they receive explicit pronunciation instruction?
The following predictions were made based on previous ID research. Studies on L2 experience and pronunciation learning have demonstrated that accuracy in producing segmental and suprasegmental features develops according to the amount of recent and meaning-oriented interaction (Saito & Hanzawa, Reference Saito and Hanzawa2016). Specifically, it has been found that participants who have recently participated in extensive extracurricular L2 learning experiences (e.g., informal interactions with native and fluent non-native speakers in the target language) and classroom-based L2 speaking activities exhibit better comprehensibility and accentedness. In other words, it seems as though high quality speech can be achieved by means of exposure to rich linguistic input and receiving formal instruction (e.g., Derwing & Munro, Reference Derwing and Munro2013 for the evidence within naturalistic settings; Muñoz, Reference Muñoz2014 for classroom settings).
When it comes to aptitude, research has shown that participants with greater phonemic coding ability and sound sequence recognition may demonstrate better accentedness (more nativelike) scores. This is arguably because they help learners attend to specific segmental and prosodic details in the input they receive (Saito et al., Reference Saito, Suzukida and Sun2019). Therefore, we predicted that the same pattern may be found in the current study. By contrast, the relationship between aptitude and comprehensibility has been shown to be weak at best. There is ample evidence that many L2 learners can continue to improve their comprehensibility (but not nativelikeness) as long as they are willing to use and practice the target language on a daily basis (Derwing & Munro, Reference Derwing and Munro2013). The linguistic features that contribute to comprehensibility are not necessarily limited to the accuracy of phonological features (e.g., Suzuki & Kormos, Reference Suzuki and Kormos2019), and thus may be unrelated to any aspects of phonological aptitude (e.g., phonemic coding ability).
With respect to the link between L2 learning motivation and pronunciation, previous studies have found that certain types of motivation may help learners notice detailed features of input under implicit learning conditions (e.g., Ushioda, Reference Ushioda2016). In fact, there is evidence that learners who are more internally motivated (i.e., highly-developed Ideal L2 self) are able to make the most out of the available input and thus see greater improvements in comprehensibility (e.g., Saito et al., Reference Saito, Dewaele, Abe and In'nami2018). However, longitudinal studies of learners in naturalistic contexts have shown that reducing foreign accentedness requires years of experience using the target language (e.g., Munro & Derwing, Reference Munro and Derwing2008). Thus, a strong sense of Ideal L2 self may not be directly linked to higher degree of accentedness. When it comes to Ought-to L2 self (i.e., the perceived obligation for learning), evidence suggests that it may not significantly predict L2 pronunciation acquisition (Saito et al., Reference Saito, Dewaele, Abe and In'nami2018). The construct of Ought-to L2 self has multiple layers, and sense of obligation can be served as either facilitator or hinderance of L2 use. However, the current study follows the findings of the past study and predicts that learning a target language because of obligation may not necessarily lead to increased L2 use and L2 exposure.
Lastly, those who report a high degree of pronunciation learning anxiety may not be able to successfully refine their perception of L2 segmental and prosodic features (Piechurska-Kuciel, Reference Piechurska-Kuciel2008). This is because anxiety can act as a further barrier to gaining opportunities to receive L2 input, and ultimately impede speech production and learning (e.g., Vasa & Pine, Reference Vasa, Pine, Morris and March2004). Hence, we predict that the learners with higher degrees of anxiety may show higher accentedness and lower comprehensibility scores.
As for the second objective of the current study (Study 2), we set out to explore the relationship between IDs and pronunciation learning in the context of explicit pronunciation instruction. Given that instruction is believed to equally facilitate adult L2 learners’ pronunciation proficiency regardless of differences in the cognitive and sociopsychological profiles among L2 learners (Pennington, Reference Pennington2021), our prediction is that participants will be able to significantly enhance their comprehensibility and reduce their accentedness over time. Furthermore, the IDs variables that will be found to affect the participants’ pronunciation proficiency at the onset may also influence the outcome of the instruction.
Study 1: Cross-sectional investigation
Participants
A total of 73 Japanese learners of English with varied learning experiences and backgrounds were recruited in Japan and included in the main analyses. Those learners reported that they had no prior experience in living or studying in English-speaking countries. None of them received any intensive pronunciation training in private English conversation schools or via private tutoring from English teachers at regular schools. They were first-year undergraduate students from various majors (e.g., engineering, medicine, sociology, education, literature, and cultural studies) and their average age was 19.41 years at the time of the project (Range = 18–20).
Procedure
After obtaining the necessary permissions from the universities in Japan, participants were recruited via posters and mailing lists. Interested students contacted one of the researchers, at which point the researcher scheduled individual appointments with each of the possible participants to determine candidacy. Upon completing a set of consent forms, the participants performed a spontaneous speech task, and took the LLAMA test on the researcher's laptop (approximately 30 minutes). Finally, they filled out a questionnaire sheet containing a set of questions about their language-learning background, L2 pronunciation learning motivation, and L2 pronunciation learning anxiety. The entire session lasted approximately 60 minutes.
Measures of individual differences
Aptitude test
In order to measure the participants’ foreign language learning aptitude, the LLAMA test was used (Meara, Reference Meara2005). The test was not only chosen for its popularity in SLA research (e.g., Bylund, Abrahamsson & Hyltenstam, Reference Bylund, Abrahamsson and Hyltenstam2010; Forsberg & Sandgren, Reference Forsberg, Sandgren, Grañena and Long2013), but most importantly due to its first-language independent nature (in comparison to other available tests that are mainly for English native speakers). The sub-tests chosen for the current study included sound sequence recognition (LLAMA D) – for implicit learning aptitude (Granena, Reference Granena, Grañena and Long2013; Suzuki, Reference Suzuki2021 for the validation), associative memory (LLAMA B), and phonemic coding ability (LLAMA E) – for explicit learning aptitude. Except for LLAMA D whose maximum score is 75%, maximum scores of LLAMA B and E are 100%. The entire test session for measuring the aptitude took approximately 30 minutes. Descriptive statistics of participants’ aptitude scores are illustrated in Supporting Information I.
Questionnaire instruments
After taking the aptitude test, the participants were asked to fill out a set of Likert-scale questionnaires that was designed to capture their L2 experience, L2 pronunciation-specific anxiety, and L2 pronunciation-specific motivation, respectively. Following previous ID studies (e.g., Kissling, Reference Kissling2014; Saito et al., Reference Saito, Dewaele, Abe and In'nami2018), we prepared a tailored questionnaire based on Language Contact Profile (Freed, Dewey, Segalowitz & Halter, Reference Freed, Dewey, Segalowitz and Halter2004) to measure the participants’ L2 experience. The items were designed to capture (a) the participants’ past L2 learning experience before the university (i.e., at elementary, junior high, and high schools), and (b) the participants’ current L2 learning experience at the university. In addition to the two distinctions (i.e., past and recent), the two types of L2 learning experience were further divided into either their time studying English inside the regular curricular classes or their time using English for the conversations with other users of English (i.e., native and non-native speakers of English) outside the classroom (cf. Kissling, Reference Kissling2014 for a similar decision). Based on the participants’ answers, total hours of L2 experience was calculated to create four types of experiential variables – past English learning inside the formal classrooms, past English use outside the formal classrooms, recent English learning inside the formal classrooms, recent English use outside the formal classrooms.
In terms of anxiety, the current study did not employ the oft-used Foreign Language Classroom Anxiety scale by Horwitz due to our emphasis on a skill-specific investigation – L2 pronunciation. Instead, the questionnaire developed by Baran-Łucarz (Reference Baran-Łucarz and Pawlak2016) was adopted in order to measure the participants’ L2 pronunciation-specific anxiety (see Supporting Information I for the items and descriptive statistics).
Finally, to measure the participants’ pronunciation-specific motivation and anxiety, the current study utilized the questionnaire items used in Baran-Łucarz (Reference Baran-Łucarz2017) which ask learners’ degree of ideal L2 self, ought-to L2 self and anxiety in terms of L2 pronunciation learning (e.g., “I imagine myself as someone who is able to speak English with accented but comprehensible pronunciation.”). The details of L2 pronunciation-specific anxiety and L2 pronunciation-specific motivation are summarized in Supporting Information I. In order to help the participants understand the questionnaire items, all the questions were translated into Japanese by the researcher and double checked by two translators. Since the Cronbach's alpha values of each construct indicated a relatively high level of internal consistency (α = .92 for ideal L2 self, α = .92 for ought-to L2 self, and α = .83 for anxiety), averaged score for each construct was computed. Finally, the interrelationship among the IDs and L2 experience variables was examined (see Supporting Information II).
Pronunciation proficiency measures
Speaking task
In order to tap into learners’ less-controlled pronunciation knowledge, a semi-spontaneous speech task was adapted from EIKEN English Test (EIKEN Foundation of Japan, 2016; also see Lambert, Kormos & Minn, Reference Lambert, Kormos and Minn2017). Following the testing procedure established by EIKEN, the task sheet included four sequential pictures with several linguistic aids and a sentence to start their description. In order to prevent topic effect, two different pictures were used (Story A and Story B) (for the details of the task sheet, see Supporting Information III). A first half of the participants described Story A, and the remaining worked on Story B. The first 30 seconds of the approximately 2-minute speeches were taken from each of the 73 speech samples and saved as WAV files for the speech rating.
L2 pronunciation proficiency rating
Whereas some studies have examined L2 pronunciation proficiency via trained raters’ assessments in accordance with detailed descriptors (e.g., Isaacs, Trofimovich, Yu & Chereau, Reference Isaacs, Trofimovich, Yu and Chereau2015), much research attention has been given to untrained raters’ intuitive judgements of L2 pronunciation proficiency. As seen in a range of existing studies (e.g., Derwing & Munro, Reference Derwing and Munro2013; Nagle, Reference Nagle2018a), we operationalized such intuitive judgements through scaler judgements of overall comprehensibility and accentedness.
Four raters (2 females, 2 males) with linguistic and pedagogical backgrounds were recruited in London. According to the research on listener factors, listeners’ judgments are likely to be affected by factors such as their familiarity with the accent (e.g., Winke, Gass & Myford, Reference Winke, Gass and Myford2013) and their language teaching experience (e.g., Kennedy & Trofimovich, Reference Kennedy and Trofimovich2008). Following the previous studies that employed subjective speech rating (e.g., Nagle, Reference Nagle2018a; Suzuki & Kormos, Reference Suzuki and Kormos2019), we carefully controlled the familiarity with Japanese-accented English. Based on a 6 point-scale (1 = not at all, 6 = very much), all four raters reported a high-level of familiarity with Japanese-accented English (M = 5.5; Range = 5–6). Thus, it was assumed that the leniency to the speech samples was relatively similar among the four raters and that they are sufficiently sensitive to the speakers’ use of Japanese sound system in the speech samples, owing to their high familiarity to Japanese-accented English. All of them held master's degrees in applied linguistics and reported extensive experience in teaching English (M = 7.8 years) and participation in-speech analyses of this kind. None of them reported any hearing problems.
Procedure of the pronunciation rating
The rating session was conducted via individual meetings with one of the researchers in a quiet room at a university in London, UK. The researcher helped the raters familiarize themselves with the rating procedure as well as the evaluation criteria. With a printed booklet, the raters were asked to listen to speech samples through headphones connected to a laptop computer, and subsequently evaluate the samples by circling a number on a 9-point scale for accentedness (1 = heavily accented, 9 = not accented at all) and comprehensibility (1 = difficult to understand, 9 = easy to understand) on a rating sheet. To ensure accurate and smooth rating, one of the researchers first provided a short training session to each of the raters prior to the main session. The training session included a brief explanation of the definitions of comprehensibility and accentedness, and a practice rating with three speech samples that were not included in the main dataset (see Supporting information IV for the training script). In order to ensure that the raters sufficiently understood the two constructs, the researcher asked the raters to explain their reasoning. Based on the explanations given, the researcher provided them with feedback. Subsequently, the raters proceeded to the main session. To avoid fatigue, the raters took 15 minutes breaks after one third, and two thirds of the speech samples were evaluated. The entire session lasted approximately 65 minutes per rater.
After all of the rating sessions were completed, the inter-rater reliability for the comprehensibility and accentedness results were calculated. The Cronbach's alpha of the four raters’ judgments of comprehensibility was α = .82 and accentedness was α = .80. Since the Cronbach alpha analyses demonstrate acceptable agreements based on Larson-Hall's (Reference Larson-Hall2010) benchmark (α > .70), the results of the four raters’ judgments were averaged to represent each speaker's comprehensibility and accentedness scores.Footnote 1
Results
Constructing mixed-effects models
Study 1 was set to examine how experiential, cognitive and sociopsychological IDs differentially influence L2 pronunciation of 73 Japanese learners of English. For this purpose, the current study used mixed-effects modeling in R (R Core Team, 2018) with Ime4 package, and built models that predict the learners’ comprehensibility and accentedness scores. Prior to the model construction, the assumptions (linearity, homoscedasticity, normal distribution) were tested by the residual analyses. The fixed effects in the modelling included sound sequence recognition, phonemic coding ability, associative memory, ideal L2 self, ought-to L2 self, and anxiety (those variables were collected at a single point in time). In order to control for L2 experience effect on the participants’ comprehensibility and accentedness, past and recent L2 experience were also included as the fixed effects. These experience-related variables include the number of hours for regular English classes (inside-classroom experience); and the number of hours for the conversations with native and non-native speakers of English outside the regular English classes (outside-classroom experience). Furthermore, to ensure the comparability of the fixed effects that were measured through the different scaling systems, they were converted to z-scores prior to the analyses. For the evaluation of the models, we employed the pairwise Likelihood Ration Test (Baayen, Reference Baayen2008) to see whether the compared model decreases the Akaike's Information Criterion (AIC; an estimator of the relative amount of information lost by a particular model) with the forward selection method. The variables that did not improve the model fit via model comparisons were discarded. The variance inflation factors (VIFs) of all the predictors were below 2.0.
Predictors of L2 pronunciation proficiency at the onset of the project
Accentedness and IDs
According to series of model comparisons based on AIC values (for the details of constructed models, see Supporting information V), the final model suggested that phonemic coding ability (β = .24), anxiety (β = -.25), and recent English learning outside the classroom (β = .51) showed a significant contribution to determining accentedness score (Table 1). The predictive powers of these variables were further confirmed by the inspection of their confidence intervals at 95% level: all the values of the estimated regression coefficients were positive. The fixed effects in the final model explain a substantial amount of variance in the accentedness score (marginal R 2 = .44).
Note. DIC = Deviance Information Criterion; AIC = Akaike Information Criterion; BIC = Bayesian Information Criterion
Comprehensibility and IDs
The model comparisons revealed that the model with the lowest AIC value includes recent English learning outside the classroom (β = .30) and recent English learning inside the classroom (β = .28) as the statistically significant predictors of higher comprehensibility (AIC = 210.25; for the model comparisons, see Supporting Information V). Furthermore, the inspections of the confidence intervals at 95% level confirmed the positive contributions of these variables to comprehensibility. Therefore, among ten variables, the fixed effects in the final model accounted for 20% of the total variance (marginal R 2 = .20).
Study 2: Longitudinal investigation
The findings of Study 1 revealed that the ID profiles of Japanese EFL students (with years of foreign language education) were differentially related to comprehensibility and accentedness scores. The participants demonstrated higher comprehensibility as long as they regularly practiced the target language both inside and outside of the classroom. However, those with more nativelike pronunciation tended to access L2 English beyond the classroom setting, demonstrated greater phonetic aptitude, and had less anxiety. One obvious limitation in Study 1 is that the data was collected at a single time point. Since the ID-proficiency link is dynamic and ever-changing in nature, Study 2 was designed to replicate the findings of Study 1 (i.e., more IDs effects for accentedness than comprehensibility) from a longitudinal approach. The goal of Study 2 was to assess the mediating roles of aptitude, anxiety and motivation in the development of L2 comprehensibility and accentedness, when participants received explicit pronunciation instruction for four weeks (50 minutes × 4 weeks). Since the existing research on L2 pronunciation instruction has demonstrated the effectiveness of explicit instruction on L2 segmental and suprasegmental proficiency (Saito & Plonsky, Reference Saito and Plonsky2019 for a review), it was assumed that the treatment (i.e., pronunciation instruction) in the current study would positively impact the comprehensibility and accentedness of participants’ L2 speech.
Participants
Out of 73 participants who took the tests at the onset of the project, 63 agreed to participate in Study 2. In order to ensure that pronunciation instruction help L2 learners make tangible improvement in accentedness and comprehensibility, participants were assigned to the experimental group who receive pronunciation instruction (n = 51), and to the control group who received grammar instruction (n = 12). The latter group did not receive any pronunciation instruction. The number of participants in the experimental group was considerably larger because the main objective of Study 2 lay in the role of IDs in L2 pronunciation learning gains. The purpose of the control group was to demonstrate test-retest effects given that similar materials were used for pre- and post-tests. Both experimental and control groups received 50-minute-long instruction every week for 4 weeks. The procedure was summarized in Figure 1.
Treatment: Experimental group
Explicit pronunciation instruction was provided to the participants in the experimental group. L2 pronunciation instructions used in past research can be broadly categorized into articulatory-based and auditory-based instructions with the former highlighting L2 learners’ understanding of the manner and place of articulation of sounds in contrast to their L1, and the latter emphasizing L2 learners’ perceptual development of sounds by introducing similarities and dissimilarities of L2 and their L1 counterparts (Saito & Plonsky, Reference Saito and Plonsky2019). Since perception and production are assumed to complement each other to facilitate L2 speech learning (Nagle, Reference Nagle2018b), the training materials in the current study comprised both perception- and production-based practice activities (see Couper, Reference Couper2003 for a similar approach; for detailed description of intervention, see Supporting Information VI; and Mora-Plaza, Saito, Suzukida, Dewaele, & Tierney, 2022. The sessions were led by a researcher who is a native speaker of Japanese with a master's degree in TESOL and highly proficient in English. The study used non-native teachers who have been shown to be capable of providing effective pronunciation instruction (Levis, Sonsaat, Link & Barriuso, Reference Levis, Sonsaat, Link and Barriuso2016), and teachers/listeners of the same L1 are better equipped at noticing pronunciation errors that are derived from the L1 phonological system (e.g., Riney, Takada & Ota, Reference Riney, Takada and Ota2000).
Treatment: Control group
The control group received grammar instruction with exercises (e.g., filling in the blanks, passage comprehension, error recognition) chosen from the textbook for The Test of English for International Communication (TOEIC) (Trew, Reference Trew2007).
Pronunciation proficiency measures
The same picture description tasks in Study 1 were used for post-tests. To ensure that participants did not work on the same prompts, however, two different versions of pictures were counterbalanced for each participant (Story A → Story B; Story B → Story A). Following the same procedure in Study 1, the same expert raters (four linguistically trained native speakers) listened to all the speech samples in a randomized order (122 samples), and made intuitive judgements for comprehensibility and accentedness. Given that the raters demonstrated an adequate level of agreement (α > .80), their rating scores were averaged to derive one single comprehensibility and accentedness score for each participant at pre- and post-tests, respectively.
Results
Constructing mixed-effects models
Study 2 was set to investigate the potential moderating effect of learner IDs and experiential variables on the effectiveness of pronunciation instruction. Therefore, after the inspection of the residuals of the variables for meeting the statistical assumptions for mixed effects model, the interactions between instruction (i.e., Time), and learner IDs were examined by following procedure. First, interaction terms were prepared by combining instruction (Time) and one fixed effect (e.g., Sound sequence recognition). After preparing the interaction terms for all the fixed effects, the codes were run individually.
Effectiveness of pronunciation instruction
In order to make sure that the groups did not differ in terms of their ID profiles, L2 experience and L2 pronunciation, a series of statistical analyses was conducted. First, prior to the t-tests, Levene's test was conducted to test the hypothesis of equal population variances. Since all the variables did not show any statistical significance, the null hypothesis of equal population variances was not rejected. Therefore, a series of t-tests were conducted to examine the possible differences between the two groups. Due to the uneven number of participants in each group (51 vs. 12), Welch's t-test was used. According to the results, the experimental and control groups were not statistically different in terms of the pre-test scores of comprehensibility and accentedness as well as the ID profiles. After the intervention, the post-test scores of the two groups were compared using paired-samples t-test. The results indicated that only experimental group showed statistically significant improvements in comprehensibility and accentedness (t = 6.468, p >.001 for comprehensibility; t = 8.436 p >.001 for accentedness). Concerning the effect size of the treatment, Cohen's d was calculated (Cohen's d = 0.7 for Comprehensibility, and Cohen's d = 1.3 for Nativelikeness). According to Plonsky and Oswald's (Reference Plonsky and Oswald2014) field-specific benchmark of the effect size, these results can be considered as medium to large effect size. Therefore, the results suggest that pronunciation instruction was equally facilitative of L2 comprehensibility and accentedness.
The roles of aptitude, motivation and anxiety in the effectiveness of instruction
According to the result of mixed effects modelling, the estimated beta values of the ID variables elicited from the experimental group (who received the pronunciation instruction) did not show statistically significant interaction effect (i.e., p >.220). The estimated beta values, standard errors, t-values of the model that includes the interactions are summarized in Table 3 for accentedness and Table 4 for comprehensibility. The results suggest that (a) the unique contribution of IDs to comprehensibility and accentedness over time; and (b) that explicit instruction can help learners enhance the comprehensibility and nativelikeness aspects of L2 pronunciation proficiency regardless of IDs profiles.
Note. DIC = Deviance Information Criterion; AIC = Akaike Information Criterion; BIC = Bayesian Information Criterion
Discussion
Focusing on the EFL context, the current study sought to examine the complex and dynamic mechanisms underlying adult L2 speech learning. To this end, we conducted cross-sectional and longitudinal investigations of how Japanese EFL students with different experiential, cognitive and sociopsychological IDs attained two different constructs of L2 pronunciation proficiency (comprehensibility and accentedness) after years of EFL education, and following pronunciation instruction. Two overall conclusions were derived. First, we argue that L2 speech learning is a highly complex phenomenon that needs to be scrutinized not only along learner dimensions (experiential, cognitive, and sociopsychological IDs), but also along linguistic dimensions (comprehensibility vs. accentedness). Second, we argue that provision of instruction can be equally effective regardless of differences in L2 learners’ cognitive and sociopsychological profiles.
R1: Roles of IDs in L2 pronunciation learning
Overall, the results confirm that experiential factors, and different aspects of IDs, play important roles in determining how, and to what degree, learners can develop their L2 speech. A positive relationship was found between time spent in the regular English classes at the university and comprehensibility. In light of evidence from previous L2 pronunciation studies that accentedness is mainly linked to segmental and suprasegmental accuracy (i.e., phonological accuracy), and that comprehensibility is associated with wider range of linguistic features such as temporal, lexical, grammatical, and phonological accuracy (e.g., Trofimovich & Isaacs, Reference Trofimovich and Isaacs2012), it seems as though the participants’ regular English classes may have helped them improve the temporal and lexicogrammatical aspects of their speech. It is noteworthy, however, that recent L2 use outside the regular English classes at the university (i.e., using L2 for communication with native and non-native speakers of English) was strongly associated with both comprehensibility and accentedness. Echoing findings from previous studies that have examined the influence of L2 experience (e.g., Baker-Smemoe & Haslam, Reference Baker-Smemoe and Haslam2013; Saito & Hanzawa, Reference Saito and Hanzawa2016), this confirms the importance of extensive exposure to, and use of, the target language in pronunciation learning (e.g., Flege, Reference Flege2016). Since this variable was associated with both accentedness and comprehensibility, it can be concluded that input and output beyond one's regular L2 experience can help further strengthen and refine one's accumulated knowledge of pronunciation and lexicogrammar. The positive links between the two types of L2 experience (classroom English learning experience vs. extracurricular conversations with native/non-native speakers) and the two dimensions of L2 pronunciation offers additional evidence for the experience-driven account of successful L2 speech learning (e.g., Muñoz, Reference Muñoz2014). This account holds that, in the EFL classroom setting, English learning experience can lead to improvements in comprehensibility via improvements in the accuracy of various pronunciation features. However, learners who make extra efforts to increase the amount of L2 use/exposure outside the classrooms (e.g., communications with international friends) may be able to reduce their degree of L1 phonological transfer and consequently reduce their accentedness.
Asymmetric patterns were found regarding the influence of cognitive and psychosocial factors: phonemic coding and lower anxiety were associated with L2 accentedness, but no factors were related to L2 comprehensibility. This could partially be explained by the differences in the constructs of accentedness and comprehensibility. Specifically, L2 pronunciation studies have revealed that accentedness is mainly linked to segmental and suprasegmental accuracy (i.e., phonological accuracy) whereas comprehensibility is associated with wider range of linguistic features such as temporal, lexical, grammatical, and phonological accuracy (e.g., Trofimovich & Isaacs, Reference Trofimovich and Isaacs2012). Based on these results, it can be concluded that L2 learners who have higher phonemic coding ability and lower anxiety may have been able to successfully reduce the use of their L1 sound system (i.e., Japanese) in L2 speech, resulting in improved segmental and suprasegmental accuracy. Because both phonemic coding and anxiety are believed to be involved in information processing (e.g., Baran-Łucarz, 2013; Skehan, Reference Skehan, Granena, Jackson and Yilmaz2016), it can also be concluded that higher phonemic coding ability and/or lower anxiety could help learners notice cross-linguistics differences, retain analyzed auditory information, and integrate it into their L2 systems.
In line with past research on explicit learning aptitude and L2 pronunciation (e.g., Baker-Smemoe & Haslam, Reference Baker-Smemoe and Haslam2013; Saito et al., Reference Saito, Suzukida and Sun2019 for a cross-sectional evidence), the results of the current study support the idea that phonemic coding ability helps learners improve the segmental and suprasegmental aspects of their speech (i.e., accentedness). However, unlike other cross-sectional studies which have found an association between associative memory, superior grammatical complexity, and speed fluency (e.g., Saito et al., Reference Saito, Suzukida and Sun2019), higher associative memory was not found to be a predictor of comprehensibility or accentedness in the current study.
These results could be explained, on the one hand, by the notion that the participants’ use of grammar and/or temporal features may not have been fully reflected in the raters’ judgements. However, an alternative explanation can be provided as well. Previous aptitude research has shown that associative memory can help learners retain a vast amount of lexical knowledge, relate new information to existing knowledge, and control the delivery of such knowledge efficiently so that it mainly involves in the later stages of L2 acquisition – i.e., the proceduralization and automatization of acquired knowledge (e.g., Skehan, Reference Skehan, Granena, Jackson and Yilmaz2016). Based on this, it is reasonable to assume that the participants in the current study may have yet reached the later stages of acquisition, and/or may not have had sufficient declarative knowledge to benefit from their superior associative memory. The same account could also explain the insignificant relationship found between sound sequence recognition and L2 pronunciation. Sound sequence recognition is believed to help L2 learners attend to L2 phonological and word sequences in an incidental and implicit fashion. It is thus considered to be essential in the later stages of L2 acquisition, i.e., for the further refinement of L2 sound processing ability and the attainment of nativelike L2 pronunciation (e.g., Granena, Reference Granena, Grañena and Long2013 for naturalistic setting; Saito et al., Reference Saito, Suzukida and Sun2019 for FL setting). Thus, participants with higher sound sequence recognition may have been in the earlier stages of L2 pronunciation acquisition, where the explicit processing and analysis of L2 sounds are more instrumental to success.
Next, a negative relationship was found between anxiety and reduced accentedness. Such a result concurs with previous studies showing that anxiety can affect L2 pronunciation acquisition (e.g., Saito et al., Reference Saito, Dewaele, Abe and In'nami2018; Szyszka, Reference Szyszka2011). In the case of the current study, however, the participants’ comprehensibility was not associated with their level of pronunciation-specific anxiety. These contrasting results may suggest that, irrespective of anxiety, participants may be able to attend to phonological features with a degree of sufficient accuracy in a way that makes their speech comprehensible. However, because anxiety is known to interfere with attention control (e.g., Piechurska-Kuciel, Reference Piechurska-Kuciel2008), high-anxiety participants may have not been able to allocate sufficient attention to the differentiation of L1 and L2 sounds when speaking.
With respect to motivation, neither Ideal nor Ought-to L2 self were linked to comprehensibility or accentedness. This provides counter evidence to past studies that have found a strong association between Ideal L2 self and comprehensibility (e.g., Saito et al., Reference Saito, Dewaele, Abe and In'nami2018). At the same time, a small but positive link was found between Ideal L2 self and recent L2 learning outside of the classroom (r = .226, p = .054, see Supporting Information III). Although this link did not reach the threshold of statistical significance, it may nevertheless suggest that participants with internalized motivation may have actively sought out opportunities to practice English outside of the classroom (e.g., Saito et al., Reference Saito, Dewaele, Abe and In'nami2018; Ushioda, Reference Ushioda2016). Unlike past studies, which have used questionnaires for general English learning in general, the current study tailored the statements to elicit responses specific to pronunciation (e.g., Baran-Łucarz, Reference Baran-Łucarz2017). Thus, further research is needed in order to confirm the relationship between pronunciation-specific motivation and L2 pronunciation acquisition.
R2: Roles of instruction in learner individual differences
The second aim of the study (Study 2) was to examine the extent to which the relationship between IDs and proficiency varied over time following explicit pronunciation instruction. The results showed that there were no significant interactions between any ID variables and instructional gains. This runs counter to prior evidence showing that aptitude moderates the effectiveness of, for example, L2 grammar instruction (e.g., Yalçin & Spada, Reference Yalçın and Spada2016). The findings rather suggest that instruction is facilitative of L2 pronunciation development regardless of learners’ ID variables. Different from L2 grammar instruction, wherein learners need to process abstract and complex concepts of language, L2 pronunciation learning mainly comprises a perceptual-motor phenomenon. In this regard, the results support the view that the explicit explanation of L2 pronunciation features (i.e., articulatory-based and auditory-based instruction) may be equally beneficial for learners with various aptitude, motivation, and anxiety profiles (e.g., Couper, Reference Couper2003).
Conclusion
The current study addressed the complex relationships between learner IDs and L2 pronunciation learning in the EFL classroom setting. Grounded in the view that pronunciation proficiency is a multi-dimensional construct with interrelated components (Saito & Plonsky, Reference Saito and Plonsky2019), we employed two holistic measurements of L2 proficiency (i.e., comprehensibility and accentedness) to illustrate their interconnectivity and interaction with an array of learner IDs. The results speak to the complex role of IDs in shaping the course of L2 pronunciation acquisition. First and foremost, the findings suggest that the extensive use of a target language greatly promotes the development of L2 comprehensibility and accentedness. In the context of the current study (i.e., English-as-a-Foreign-Language), such experience-related factors include the amount of language-focused practice inside classrooms and conversational interactions with users of English outside classrooms. When it comes to linguistic nativelikeness (accentedness), however, further improvement can be observed only among certain individuals with greater phonemic coding ability and lower levels of anxiety towards L2 pronunciation learning. The absence of any links between IDs and instructional gains suggests that pronunciation-focused instruction is effective for L2 learners regardless of their ID profiles.
As for theoretical contribution, the current study is the first attempt to extend the integrative framework of SLA to L2 pronunciation in EFL classroom contexts. Echoing the fundamental tenant of DST and the trilogy of mind, the study provides a comprehensive picture of the complex relationship between use, learner individual differences, and language development. On a broad level, our findings indicate that whereas both socio-psychological individual differences are tied to use (e.g., greater motivation leads to more practice inside and outside classrooms), cognitive aptitude servers as a factor of advanced L2 acquisition. In the field of L2 pronunciation, however, we add that one domain-specific crucial source of individual variation concerns the dimensions of proficiency, i.e., comprehensibility vs. accentedness. As for comprehensibility, which many scholars consider as an index of a functional user of L2 English (e.g., Derwing & Munro, Reference Derwing and Munro2013), there is a great possibility that more L2 practice leads learners to be comprehensible. As for accentedness, which has been claimed to represent an ideal (but not necessarily realistic) goal of L2 speech learning, foreign accent reduction can be an extremely difficult task especially among post-pubertal learners, and limited to certain individuals with high-level cognitive aptitude (Linck et al., Reference Linck, Hughes, Campbell, Silbert, Tare, Jackson, Smith, Bunting and Doughty2013).
To close, several limitations of the study need to be acknowledged. First of all, the participants’ L2 experience profiles were surveyed using a questionnaire (i.e., Language Contact Profile). Although the use of self-report data is common in SLA (cf. Derwing & Munro, Reference Derwing and Munro2013), it may not accurately reflect participants’ actual language exposure. Therefore, the findings related to L2 experience in this study need to be treated as tentative. More accurate measurements of the quantity and quality of L2 experience should be obtained in future studies by, for example, asking participants to track their L2 interactions using their mobile phones (Surtees, Reference Surtees2013) or using electronic language logs (Ranta & Meckelborg, Reference Ranta and Meckelborg2013).
Secondly, we would like to emphasize that the findings in the current study need to be replicated and verified. Following previous L2 pronunciation studies (e.g., Saito et al., Reference Saito, Suzukida and Sun2019), we used the LLAMA test to gauge participants’ foreign language learning aptitude. However, because several scholars have recently cast doubt on the reliability of this battery (e.g., Bokander & Bylund, Reference Bokander and Bylund2020), the results need to be treated with some caution. In addition, as we illustrated in the literature review, there is a wealth of influential aptitude tests such as the CANAL-F test (Grigorenko et al., Reference Grigorenko, Sternberg and Ehrman2000) and Hi-LAB (Doughty et al., Reference Doughty, Campbell, Mislevy, Bunting, Bowles and Koeth2010) that can be employed as research tools. In order to confirm the relationship between L2 pronunciation and aptitude, it is thus important to replicate the study with more reliable aptitude measures.
Thirdly, the current study used single speaking task (i.e., picture description task) to evaluate participants’ L2 pronunciation performance. However, it has been recognized that speaking style and the type of L2 knowledge used in L2 speech (i.e., controlled vs. spontaneous knowledge) varies depending on the nature of tasks and condition of its administration (e.g., controlled vs. semi-structure vs. fully free tasks). Because of this, it is crucial for future studies to assess speakers’ performance using multiple speaking tasks (see Saito & Plonsky, Reference Saito and Plonsky2019 for more discussion of task type in relation to L2 declarative knowledge).
Fourthly, we acknowledge that pronunciation skills, the main focus of this study, comprise only one aspect of general L2 proficiency. More studies are needed to assess whether, to what degree, and how aptitude, motivation, and emotion mediate L2 pronunciation improvement, and how this ultimately impacts the process and product of general L2 learning. For example, to unpack the relationships between IDs and general proficiency, it would be interesting to examine the generalizability of our findings to reading, listening, writing, grammar, and vocabulary learning. Future studies should also develop, validate, and refine theoretically sound methods to tap into the highly complex nature of L2 general proficiency.
Lastly, the purpose of the current study was to capture the complex relationship between different IDs in relation to L2 pronunciation proficiency. However, we acknowledge that we only covered a small number of key IDs. In order to fully apply the principle of DST and provide a fuller picture of the relationship between IDs and L2 pronunciation development, future studies should include as many factors as possible, including working memory (e.g., Hu et al., Reference Hu, Ackermann, Martin, Erb, Winkler and Reiterer2013), musical aptitude (Li & DeKeyser, Reference Li and DeKeyser2017), and personality (e.g., Hu & Reiterer, Reference Hu, Reiterer, Dogil and Reiterer2009).
Future direction
In this current project, we aimed to track the relationship between IDs and L2 learning over time (e.g., Serafini, Reference Serafini2017) via both cross-sectional and longitudinal investigations. Although participants’ ID profiles were examined only once at the beginning of the project, we would like to emphasize that ID factors (especially related to sociopsychological dimensions of L2 learners) can be considered as a dynamic (rather than stable) phenomenon. There is ample evidence demonstrating the fluctuations among L2 learners’ motivation (e.g., Pawlak, Reference Pawlak2012; Waninge, Dörnyei & De Bot, Reference Waninge, Dörnyei and De Bot2014) and state anxiety (Gregersen, 2020). There has been an ongoing debate on the malleability of language aptitude among aptitude researchers (e.g., Kormos, Reference Kormos, Granena and Long2013; Singleton, Reference Singleton2017; Wen, Biedroń & Skehan, Reference Wen, Biedroń and Skehan2017). To this end, we call for future research which will examine the ever-changing nature of various L2 learners’ IDs and its impact on L2 speech learning at different time points over an extensive period of L2 immersion and classroom instruction.
Acknowledgments
This research was supported by the Language Learning Dissertation Grant. We would like to thank Andrea Revesz, Joan Carles Mora, and Viktoria Magne for their constructive feedback. We also thank Talia Issacs, Pavel Trofimovich, Hui Sun, and Xiaojun Lu for their assistance and feedback in the process of data collection and analysis.
Supplementary Material
For supplementary material accompanying this paper, visit https://doi.org/10.1017/S1366728922000700