Introduction
Lisa: Did you know that the Chinese use the same word for “crisis” as they do for “opportunity”?
Homer: Yes. Crisitunity!
- The Simpsons, Season 6 Episode 11.
Though the above quote is inaccurate with regard to the Chinese language and intended for comedy, the fundamental idea is actually a sound notion for research. Crises can create opportunities for those wise and wary enough to use them. The validation crisis noted in the recent paper by Al-Hoorie et al. (Reference Al-Hoorie, Hiver and In’nami2024) comes out of empirical and theoretical problems with the second language (L2) Motivational Self System (L2MSS) as it has been measured thus far—and not just on one occasion. Their work indicated that many of the constructs did not adequately differentiate from one another, a failure of what is known as discriminant validity. The authors have raised the alarm to say that failure to adequately distinguish constructs may lead researchers to invalid and unjustified conclusions, requiring a complete bottom-up methodological and theoretical overhaul of the theory before new substantive conclusions can be made. With this overhaul, the crisis can be turned toward an opportunity for improving the general strength and hygiene of research on the psychology of language learning. The validation crisis, better thought of now as a crisitunity, requires a fundamental reconsideration of the substantive elements of the L2MSS and a rigorous methodological update and overhaul to match a growing paradigm shift.
A key marker of the need for change can be seen in the attention given to the results and findings. The current article is the third response to this particular article (Henry & Liu, Reference Henry and Liu2024; Papi & Teimouri, Reference Papi and Teimouri2024). While most empirical work does not generate this level of discourse, the challenge issued by the original article, and the voices raised in nuanced analysis of that challenge (Henry & Liu, Reference Henry and Liu2024) and defense of the status quo (Papi & Teimouri, Reference Papi and Teimouri2024) are indicative of a Kuhnian paradigm shift (Kuhn, 1962/Reference Kuhn2012) in the field of the psychology of language learning. To wit, in Kuhn’s (1962/Reference Kuhn2012) description of scientific revolutions, scientific progress is not linear but represented by periods of “normal science” guided by an established paradigm of thought and practices, punctuated by “scientific revolutions” triggered by empirical and theoretical anomalies—like those found by Al-Hoorie et al. (Reference Al-Hoorie, Hiver and In’nami2024) and Henry and Liu (Reference Henry and Liu2023, Reference Henry and Liu2024). These revolutionary periods are not merely knowledge advancements but paradigm shifts involving a complete rethinking of the practices and approaches in the field, often facing resistance from the established scientific community due to their disruptive nature. This resistance, mirroring the power struggles seen in political revolutions, highlights the political nature inherent in these scientific upheavals. As it has done before (see Dörnyei & Ryan, Reference Dörnyei and Ryan2015), the state of the psychology of language learning has reached the evidentiary, theoretical, and political need for a paradigm shift.
Al-Hoorie et al.’s (Reference Al-Hoorie, Hiver and In’nami2024) work represents the culmination of a critical set of findings, the tipping point where business-as-usual “normal science” has become untenable. Though the opposing voices are engaged in some hair-splitting, angels-on-pinheads reanalysis, and commentary on the work (Papi & Teimouri, Reference Papi and Teimouri2024), the results follow a pattern of findings and theoretical analyses that indicate that the time for change has come. To date, some work has indicated the need for theoretical overhaul (Henry & Liu, Reference Henry and Liu2023, Reference Henry and Liu2024); other researchers have sought to expand the theory with greater care for obviously missing parts of the model (Papi et al., Reference Papi, Bondarenko, Mansouri, Feng and Jiang2019; Teimouri, Reference Teimouri2017; Thorsen et al., Reference Thorsen, Henry and Cliffordson2017); still others have shown further failure to replicate the model as originally hypothesized (Hiver & Al-Hoorie, Reference Hiver and Al-Hoorie2020a). Al-Hoorie et al. (Reference Al-Hoorie, Hiver and In’nami2024) have raised the alarm and warned of the possibility of propagating errors throughout the field of the psychology of language. Given the state of flux in the L2 self-system both substantively and methodologically, the Al-Hoorie et al. (Reference Al-Hoorie, Hiver and In’nami2024) paper rightly calls for a moratorium on substantive studies of L2 selves until these methodological issues can be effectively resolved.
In this commentary, I wish to indicate how Al-Hoorie et al. (Reference Al-Hoorie, Hiver and In’nami2024) have indicated the correct response of a substantive moratorium on the use of the L2 selves, survey some past problems with the literature to highlight the need for greater theoretical and methodological clarity, and present ways to move the field forward that may allow the field of motivational psychology in language learning to continue to include the concept of an L2 self. Recognize that a substantive moratorium does not call for the topic to be abandoned altogether; rather, there is a need to first get the methods and theory in a workable state before any meaningful educational and psychological findings can be accepted.
Methodological struggles in L2 selves literature
No matter whether we are in agreement or disagreement with every aspect of the interpretation, what Al-Hoorie et al. (Reference Al-Hoorie, Hiver and In’nami2024) have indisputably indicated is that the measurement of the L2 Motivational Self System as practiced to date has several underlying problems. This is not the first time research has indicated fundamental shortcomings; Al-Hoorie’s (Reference Al-Hoorie2018) meta-analysis showed that the key variables of the model (ideal self, ought-to self, and L2 experience) showed wide heterogeneity in their relationship with language achievement. As a meta-analysis, this study should be treated with a fair degree of weight, as it surveyed the entire field up through 2018. More recently, Hiver & Al-Hoorie (Reference Hiver and Al-Hoorie2020a) used preregistration to show that one of the largest studies to test this theory (You et al., Reference You, Dörnyei and Csizér2016) failed to replicate.
For the latter study, this was actually unsurprising: many of the fit indices and other standard empirical markers of quality within the original study by You et al. (Reference You, Dörnyei and Csizér2016) were close to borderline for the cutoffs used in structural equation modeling. In normal models this would not be problematic, save that given the very large (>10,000) sample size of the population in this initial study, common goodness-of-fit indices are often less sensitive to model misspecifications (Marsh et al., Reference Marsh, Wen and Hau2004). They were thus likely inflated, and given their borderline nature, might have failed to fit given a smaller sample. Additionally, some of the correlations exceeded .8, indicating collinearity issues above the level noted by Dörnyei (Reference Dörnyei2007) and within the range for lack of differentiation noted by Papi and Teimouri (Reference Papi and Teimouri2024). Thus, though the original study verged on acceptable, it was quite probably misspecified from the start, and the preregistered replication published in 2020 was sorely needed.
In truth, studies using the L2MSS and failing to demonstrate adequate methodological rigor are unfortunately quite prevalent; the highlighted studies by Al-Hoorie et al. (Reference Al-Hoorie, Hiver and In’nami2024) and Hiver & Al-Hoorie (Reference Hiver and Al-Hoorie2020a) are exceptional for their use of methods that promote a more parsimonious interpretation of the constructs. While there are debatable aspects of Al-Hoorie et al. (Reference Al-Hoorie, Hiver and In’nami2024), including the choice to use collinearity cutoffs of .6 noted by Papi and Teimouri (Reference Papi and Teimouri2024), these decisions were in fact justified within the literature. I note here that the .6 cutoff came from Dörnyei’s (2007) own contention, making the correlations of supposedly different constructs exceeding .8 originally found in You et al. (Reference You, Dörnyei and Csizér2016) additionally troubling.
These findings can be seen as extensions compounding errors created early in the research paradigm. The initial studies (Taguchi et al., Reference Taguchi, Magid, Papi, Dörnyei and Ushioda2009) creating the instruments most often used in L2MSS studies skipped the creation of a measurement model prior to creating a structural model, a failure of basic modeling practice (Kline, Reference Kline2023), and hence many of the following studies have struggled to replicate. They further relied heavily on error correlations, a practice often frowned upon in structural equation modeling (Kline, Reference Kline2023; MacCallum et al., Reference MacCallum, Roznowski and Necowitz1992). The problem with correlated errors (likely modeled post hoc through the use of modification indices), is that the model thus relies less on the initial hypothesized theory, and more on potentially unseen confounding factors (see Al-Hoorie & Hiver, Reference Al-Hoorie and Hiver2024, for a full discussion). Though Taguchi et al. (Reference Taguchi, Magid, Papi, Dörnyei and Ushioda2009) tested their survey instruments across three cultural contexts, each context had different correlated errors, indicating discriminant validity problems from the outset. The use of highly differing correlated errors in each of the cross-sectional samples without note or explanation of why these residual terms might share variance (e.g., similar wordings in the language used, cultural connections in ideas, proximal placement on the survey) means that the constructs are not as clean or clear as we might desire when describing target qualia.
There were further no measurement invariance tests used to ensure that the constructs were represented by the same structures in all three locations. As Taguchi et al. (Reference Taguchi, Magid, Papi, Dörnyei and Ushioda2009) is a method validation report, this raises large questions regarding the validity of item structure across localities (DeVellis & Thorpe, Reference DeVellis and Thorpe2021; Hirosawa et al., Reference Hirosawa, Kono and Oga-Baldwin2024). Given the differentiations in wordings beyond basic emic considerations for cultural and local understanding, this is somewhat understandable; perhaps the items might not localize in the same way. However, these wording differences were further done without noting theoretical justification or localized validation procedures (Hirosawa & Oga-Baldwin, Reference Hirosawa, Oga-Baldwin, Butler and Huang2022), leaving us open to speculate that the wordings chosen were simply based on the judgment of the authors. The models are therefore not directly comparable due to differences in item content and quantity. The constructs thus cannot truly be said to represent the same things across any of the locations studied, which may explain the problems found in L2MSS studies.
Though this study was reported as a book chapter, and thus its inclusion might be overlooked as evidence for general acceptance of weaker fitting models throughout the L2MSS paradigm, this study has also been used as the first instrument creation for widespread empirical testing of the theory, which has led to more mainstream examples of methodological inadequacy. It is cited over 1500 times according to Google Scholar, indicating the potential for misspecified studies to number now into the thousands. The practice of (over)use of modification indices and correlated errors has continued in other well-cited research (e.g., Papi, Reference Papi2010). Updates to the models and instruments have been largely based on refining and expanding those presented by Taguchi et al. (Reference Taguchi, Magid, Papi, Dörnyei and Ushioda2009); work by Teimouri (Reference Teimouri2017; 2018) and Papi and colleagues (2018) has updated some of these ideas, though may also be compounding some of these issues of methodological fuzziness. There have thus been problems with instrumentation and representation of phenomena from the outset, which have propagated down through the research paradigm.
In one of the more egregious examples, Moskovsky et al. (Reference Moskovsky, Assulaimani, Racheva and Harkins2016) based their instruments on the work by Taguchi et al. (Reference Taguchi, Magid, Papi, Dörnyei and Ushioda2009), and presented their findings as a structural model while not actually completing one; the work used basic multiple regression without testing structural model fit. It is worth noting here that a composited regression model and a structural model should not be represented in the same fashion. Far worse, the authors failed to recognize that the variables made use of a Likert-type scale rating 1 as “higher” while 5 was “lower” (opposite Taguchi et al., Reference Taguchi, Magid, Papi, Dörnyei and Ushioda2009, which read: 1 [Strongly disagree] to 6 [Strongly agree] [p. 90]). They then used these variables to predict International English Language Testing System (IELTS) scores, which would rate in the opposite direction (1 - low, 9 - high). This would explain the negative relationship between the motivational variables and achievement as an erroneous artifact—the study is unique in indicating that lower quantity and poorer quality of motivation lead to better learning. A far more parsimonious interpretation would be that the authors did not appropriately transform their data to align their variables; given the other methodological shortcuts found this seems quite likely.
The authors further exclude a large amount of necessary data (such as how correlations were run without running a structural model or what statistical package was used for the analyses), while relying on extremely low Cronbach’s alpha statistics (.59–.69) to indicate factor unity rather than running proper factor analyses. The failure to account for basic math and methods in the article is quite disappointing given the number of citations the article has generated (246 on Google Scholar at current writing); the authors have failed to respond to repeated inquiries regarding the paper. Its inclusion in one of the top journals in our field indicates an unfortunate acceptance of weak methods when they fit an accepted paradigm (Kuhn, 1962/Reference Kuhn2012); at least two reviewers and the journal editors had to read and sign off on these results. In an ideal world, this paper would likely be retracted.
In the interest of balance, improvements and theoretical expansions have indeed come through in the intervening years. Some recognition should be given to the work by Papi and colleagues (2018) for updating the scales from the prior models and expanding the theoretical components. While these empirical studies show some improvements, they have not remedied the deeper theoretical problems with the conceptualizations. Indeed, these substantive findings unfortunately stand on shaky theoretical grounds. One major issue in the coming research paradigm will be continuing to ameliorate the statistical and methodological shortcomings.
Theoretical fuzziness in conceptualization
Beyond these existing empirical issues, comments have been made regarding the theoretical consistency of the components of the theory. Henry and Liu (Reference Henry and Liu2023) have indicated how the L2 motivational self-system fails to represent a self-system according to most of the common definitions used by social psychologists; their response commentary highlights the jingle–jangle problems and overlaps with the formulations as they currently exist (Henry & Liu, Reference Henry and Liu2024). The L2 motivation self-system may represent a set of phenomena common to language learners, but also may not comprise a complete system of selves in the grander organismic self-regulatory sense (Henry & Liu, Reference Henry and Liu2023). The fundamental philosophical and theoretical conceptualizations of the L2MSS thus stand on shaky ground.
Another aspect of the theory requiring attention is the definitions and terminologies used. With regard to the notion of the L2 learning experience (Dörnyei & Ryan, 2015; Dörnyei, Reference Dörnyei2019), we can note that in jingle fashion, this variable is also sometimes interchangeably called attitudes to the L2 (Taguchi et al., Reference Taguchi, Magid, Papi, Dörnyei and Ushioda2009; Moskovsky et al., Reference Moskovsky, Assulaimani, Racheva and Harkins2016, etc.), indicating a terminology issue that has not been resolved within the theoretical community. This variable has been indicated as the best predictor for L2 outcomes: some measures of L2 learning (Al-Hoorie, Reference Al-Hoorie2018; Dörnyei, Reference Dörnyei2019) and the problematic “intended effort” variable. As noted by Al-Hoorie et al. (Reference Al-Hoorie, Hiver and In’nami2024), intended effort is a troubling construct, both because it is quite close in wording to the enjoyment of learning, and further because the gulf between intentions and actions often goes uncrossed (Oga-Baldwin et al., Reference Oga-Baldwin, Fryer and Larson-Hall2019). Unpacking L2 learning experience based on its indicators reveals an even deeper issue. Many of the indicators share similar wording and meaning with measures of intrinsic motivation (Noels et al., Reference Noels, Pelletier, Clément and Vallerand2000; Oga-Baldwin & Nakata, Reference Oga-Baldwin and Nakata2017; Ryan & Connell, Reference Ryan and Connell1989). Table 1 shares the similarities and overlaps between the L2 learning experience and more widely used measures of intrinsic motivation.
* While English is used in the originals, any language could potentially be inserted here.
Granted, there are slight differences in the wording, and an empirical comparison is indeed warranted. At the same time, these wordings show a high degree of overlap; perhaps enough to merit serious questioning regarding jangle similarities.
Without question, both scales indicate feelings of interest, positivity, and enjoyment in learning a foreign language. As noted, the L2 learning experience is the “Cinderella” variable of the L2MSS (Dörnyei, Reference Dörnyei2019). At the same time, intrinsic motivation is a well-recognized construct running across many fields of study (Ryan & Deci, Reference Ryan and Deci2017). It is perhaps for these among other reasons that Al-Hoorie et al. (Reference Al-Hoorie, Hiver and In’nami2024) proposed self-determination theory as an alternate generalized model for explaining language learning motivation: theoretical parsimony would indicate that if the strongest variable in the L2MSS is substantively the same as intrinsic motivation, it is reasonable to surmise that the model may duplicate effects found in a theory where intrinsic motivation plays a central role. Many of the findings (Al-Hoorie, Reference Al-Hoorie2018) may in fact be inadvertently replicating the effects of intrinsic motivation.
Proponents of the L2 selves models have also indicated that the external to internal continuum represented by self-determination theory (Ryan & Deci, Reference Ryan and Deci2017) may underlie the structure of the ideal and potential varieties of ought-to L2 selves. Teimouri (Reference Teimouri2017) nicely charted this continuum, placing the ought-to selves as language-specific versions of external motivations, while noting that the ideal L2 self is similar to intrinsic motivation. From the perspective of parsimony, if these variables can be charted on the same continuum then they are likely best represented by the more universally recognizable formulations which also present a strong set of corollary hypotheses and measurable effects, rather than theoretically tortuous and incomplete constructions with tenuous internal and external validation.
Resolving the jingle, the jangle, and the crisis
Jingle–jangle issues are well-recognized in both education and psychology (Skinner, Reference Skinner2023a, Reference Skinner, Bong, Reeve and Kim2023b). Throughout our fields, we find psychometric fuzziness, theoretical overlaps, and yes, even some “thumb-on-the-scale” practices like the overuse of modification indices and correlated errors used to make poorly fitting models look acceptable. The presence of poor practice elsewhere does not excuse us from overlooking it within our own field. On the contrary, it makes appropriate modeling all the more crucial; we can do better.
As Al-Hoorie et al. (Reference Al-Hoorie, Hiver and In’nami2024) have indicated, the scales that have been developed for use with the L2 Motivational Self System show rather extreme problems. Whether or not they are overlapping similar constructs with theoretically unique contributions—a la the oft-noted jingle–jangle of self-concept and self-efficacy (Marsh et al., Reference Marsh, Pekrun, Parker, Murayama, Guo, Dicke and Arens2019)—is a moot point, given the lack of theoretical consistency with other psychological concepts (Liu & Henry, Reference Liu, Chong, Marsden, McManus, Morgan-Short, Al-Hoorie, Plonsky, Bolibaugh, Hiver, Winke, Huensch and Hui2023; Henry & Liu, Reference Henry and Liu2024). The data indicate that where theory states we would expect two or more constructs, we instead often find only one. In looking through the data (kindly made available via open science repositories), the results show that the factor analytic procedures used are correct. Independent confirmation of the data showed that measures of incremental and absolute fit for some of the most widely used scales do not support these scales as neatly unidimensional; to the contrary, many are theoretically uninterpretable.
Though Papi and Teimouri (Reference Papi and Teimouri2024) provide their own reanalysis of the Al-Hoorie et al. (Reference Al-Hoorie, Hiver and In’nami2024) data, these reinterpretations are quite theoretically tortuous and lack the parsimony of Al-Hoorie et al. (Reference Al-Hoorie, Hiver and In’nami2024). A number of the contentions by Papi and Teimouri (Reference Papi and Teimouri2024) regarding supposed discriminant validity require a post hoc partial correlation of the constructs, seemingly not based on latent factor models. This is not a preferred solution, as latent variables retain all the information of model error terms (Kline, Reference Kline2023), while mean structures do not. Partial correlations of theoretical constructs thus cannot supersede the more stringent factor analyses carried out by Al-Hoorie et al. (Reference Al-Hoorie, Hiver and In’nami2024); partial correlation analysis alone is not a generally accepted method for determining discriminant validity (Devellis & Thorpe, Reference DeVellis and Thorpe2021). Further, these post hoc correlation tests would seem to replicate the “modeling by modification index” problems found in Taguchi et al. (Reference Taguchi, Magid, Papi, Dörnyei and Ushioda2009). Though some might think the .6 cutoff for construct overlap to be too loose, the failure of the factor analytic procedures to reproduce the hypothesized structure in both You et al. (Reference You, Dörnyei and Csizér2016) and Taguchi et al. (Reference Taguchi, Magid, Papi, Dörnyei and Ushioda2009) confirms the problematic nature of the L2MSS instruments.
The conclusions warrant a substantive moratorium on the use of these scales, as called for by the authors until improved measures are developed. Stopping the propagation of misleading constructs is the heart of this measurement crisitunity; it is not a frivolous call to panic as indicated by claims of a “manufactured” crisis (Papi & Teimouri, Reference Papi and Teimouri2024), but a rather a well-reasoned request to improve the quality of research in the psychology of language learning. Note that this substantive moratorium is not a call to cease all research on the topic, but rather a call to first address the theoretical (Henry & Liu, Reference Henry and Liu2023) and methodological problems before attempting to make any unjustifiable practical conclusions. As researchers, this type of moratorium is achieved only by consensus; we as a field can hopefully agree to the conclusion that the L2 Motivational Self System requires an overhaul before we return to a period of “normal science” (Kuhn, 1962/Reference Kuhn2012). While this may be shocking and frustrating to researchers with projects underway, there are several reasons to retain hope.
-
1. Al-Hoorie et al. (Reference Al-Hoorie, Hiver and In’nami2024) represent a single (though strong) data point. From the meta-analytic lens (Kline, Reference Kline2019), projects that are currently in progress (ethics cleared, data gathered, analysis underway), may yet find results contrary to those reported by Al-Hoorie et al. (Reference Al-Hoorie, Hiver and In’nami2024). Some of the substantive papers might be pivoted into validation papers. If the results of these hypothetical studies make use of up-to-date methods with longitudinal designs, appropriate sample sizes, open data sharing, and no post hoc repurposing, they are to be welcomed. At the same time, given that findings incongruous with the theory as developed appear to be the norm (cf. You et al., Reference You, Dörnyei and Csizér2016; Al-Hoorie & Hiver, Reference Hiver and Al-Hoorie2020a), studies indicating parity with theory may well also be the outliers (e.g., Papi et al., 2018). The matter of whether current theory remains valid comes down to an empirical question, one only answerable through good methods with open science.
-
2. Potential authors of studies that found differing but no less uninterpretable results using the L2MSS can also find succor; you did nothing wrong. The baseline instruments were problematic from the start. These instruments can be safely reexamined and reconstructed using approaches similar to those of Al-Hoorie et al. (Reference Al-Hoorie, Hiver and In’nami2024). Perhaps the work of Al-Hoorie et al. (Reference Al-Hoorie, Hiver and In’nami2024) can provide a theoretical framework for interpretation—and a future chance to preregister models likely to work.
-
3. Moving forward, the preregistration of models and hypotheses will receive a preference for interpretation (Liu et al., Reference Liu, Chong, Marsden, McManus, Morgan-Short, Al-Hoorie, Plonsky, Bolibaugh, Hiver, Winke, Huensch and Hui2023). Ongoing research like that noted above which uses the open science approaches will not be fallible to the same post hoc rerationalization. Several preregistered models that perfectly replicate the L2 Self variables as they were originally hypothesized without relying on post hoc modifications to work might be sufficient to overcome the crisis.
-
4. Caveat: Simply because the instruments and theories do not align when run through factor analytic procedures does not mean that this represents the wrong method. This argument, though sometimes made, is akin to blaming concrete as a building material generally when any house falls down, regardless of the circumstances. Indeed, it may not be ideal in all circumstances, but when used correctly it forms a very stable foundation. At the same time, other methods do indeed exist for determining the dimensionality of constructs (see DeVellis & Thorpe, Reference DeVellis and Thorpe2021), and these can also be used. Methods for defining the dimensionality of constructs (e.g., confirmatory composite analysis; Alamer et al., Reference Alamer, Schuberth and Henseler2024) still rely on similar metrics and heuristics for ensuring discriminant validity, and these metrics and their cutoffs do not sidestep the problems (e.g., multicollinearity and theoretical overlap) indicated by Al-Hoorie et al. (Reference Al-Hoorie, Hiver and In’nami2024) and Henry and Liu (Reference Liu2024).
As implied from the above possible approaches to the crisis, the moratorium/crisitunity does not necessarily mean a full stop to studies; rather, it is a temporary stop to substantive development until methods are able to catch up. The call does not go out from a perspective of authority, but rather in the same way as a witness, whistleblower, or credible informant raises an alarm noting problems. It is a signal that we as a community can and should come together to ameliorate measurement problems in the field of language learning psychology.
As noted, proper scale development procedures appear to have been overlooked when creating the scales used to define the L2MSS. Though work like that done by Taguchi et al. (Reference Taguchi, Magid, Papi, Dörnyei and Ushioda2009) is laudable for its attempt at cross-cultural application and diversity of the sample, the reliance on post hoc measures like error correlations indicates that it, at the very least, requires refinements, or more likely needs an overhaul. One way around this will be the use of open science methods, including preregistration of hypotheses and methods (Liu et al., Reference Liu, Chong, Marsden, McManus, Morgan-Short, Al-Hoorie, Plonsky, Bolibaugh, Hiver, Winke, Huensch and Hui2023). It is important to remember that properly done replication and validation studies are indeed real research, and deserve equal respect from editors and reviewers.
Moving forward, there is a need for very serious fundamental conceptual, theoretical, and methodological work needed to make the L2 Motivational Self model a reliable measure of motivation, and not simply a “flash in the pan” phenomenon as might be indicated by its rapid rise (Boo et al., Reference Boo, Dörnyei and Ryan2015; Liu, Reference Liu2024). Many of the findings to date are unfortunately suspect due to this shaky foundation—given this situation, we cannot be certain that any of the results in this paradigm are trustworthy. For those willing to put in the work, a well-developed program of theoretical exploration and empirical verification could well form several academic careers. However, this will need to be done from first principles. For those seeking theoretical and empirical connection without the need to first lay a foundation, other established traditions may offer a path with fewer obstacles.
Other theoretical avenues
For those wishing to research motivation without the need to reset the fundamentals, there are indeed several examples of lower-hanging fruit. These are offered not as the most correct or most intuitive to language education, but rather available options with the potential for cross-disciplinary appeal.
As a caveat, I come from a specific tradition, and as a statement of bias, I believe that this tradition has much to offer in terms of explanatory power. Philosophically and empirically, self-determination theory offers a clear model that allows for good hypothesis testing while promoting positive well-being and social responsibility (Sugita McEown & Oga-Baldwin, Reference Sugita McEown and Oga-Baldwin2019). I fundamentally agree with Al-Hoorie et al. (Reference Al-Hoorie, Hiver and In’nami2024) that channeling resources toward self-determination theory can and will improve the overall landscape of L2 research. Several SDT minitheories remain under-researched (Al-Hoorie et al., Reference Al-Hoorie, Oga-Baldwin, Hiver and Vitta2022), including goal contents minitheory and causality orientations minitheory. Exploring these minitheories can help to increase the explanatory power of the larger metatheory. At the same time, this is by no means the only or optimal set of models that could be adapted for language learning research.
Other theories and constructs abound. Self-efficacy as a construct has received attention in L2 research (Fryer et al., Reference Fryer, Li, Guo, Liang and Zhong2024), but it is often researched outside of the grounds under which it was conceived. Though originally developed as part of social cognitive theory (Bandura, Reference Bandura1976; Reference Bandura1986; Reference Bandura1997), it is often operationalized outside this framework, promiscuously inserted into any model that requires a competence component. This is somewhat unfortunate, as social cognitive theory (SCT) offers several powerful elements, not the least of which is a model for how learning occurs through watching others and receiving instruction (Bandura, Reference Bandura1976). As of yet, though self-efficacy has seen several studies in recent years (e.g., Li & Zhang, Reference Li and Zhang2023), SCT more broadly has only been very weakly explored in the L2 realm (Fryer et al., Reference Fryer, Li, Guo, Liang and Zhong2024).
At the same time, numerous researchers would prefer to have more language-specific theories. Similar to SCT, Alexander’s (Reference Alexander2003) model of domain learning (MDL) seeks to explain how learners go from novice to mastery within a specific area. Though a generalized educational theory, it is flexible enough to allow for language specialization (Parkinson & Dinsmore, Reference Parkinson and Dinsmore2019). Developmental perspectives of this nature complement those of complex dynamic systems (Hiver & Al-Hoorie, Reference Hiver and Al-Hoorie2020b), and when modeled appropriately can show how learners move through different states toward stable outcomes. A hybrid developmental model of both motivation and acquisition could be powerful indeed—and perhaps powerful beyond the language-specific domain.
A dream scenario would be attempts at theoretical comparison, competition, integration, and testing (King & Fryer, Reference King and Fryer2023). How much variance does SDT’s continuum of organismic integration from external to internal regulation explain when run against the L2 selves? Does a discrepancy between the current and ideal self really drive learners toward greater mastery in the MDL, or could that better be explained by a sense of self-efficacy? Which of these constructs might be colliders, changing the strength and direction of effects and outcomes by their presence in a model (Al-Hoorie & Hiver, Reference Al-Hoorie and Hiver2024), which are sibling constructs (Lawson & Robins, Reference Lawson and Robins2021), and which are confounders, occupying the same conceptual space? What theoretical variables might actually be parent or second-order factors with multiple effects through the proposed model? Ideally, the current measurement crisis in L2 motivation will find common ground with the ongoing crisis of theoretical diaspora in education and psychology (Skinner, Reference Skinner, Bong, Reeve and Kim2023b). How can we find useful and generalizable models that communicate across fields and contexts?
These are empirical questions that can be answered with good designs, good instrumentation, and research hygiene, especially the use of preregistration as the highest standard of such (Liu et al., Reference Liu, Chong, Marsden, McManus, Morgan-Short, Al-Hoorie, Plonsky, Bolibaugh, Hiver, Winke, Huensch and Hui2023). At present, the instrumentation that exists for the L2MSS as it was used in Al-Hoorie et al. (Reference Al-Hoorie, Hiver and In’nami2024) has not achieved an acceptable level of trustworthiness, thus leading to the current crisitunity. Theoretically, the models that have been developed to date require an overhaul before they can be trusted (Henry & Liu, Reference Henry and Liu2023; Reference Henry and Liu2024), lest we send future researchers down the garden path toward disappointment and dead-ends. The jingle–jangle overlaps, intimated by Teimouri (Reference Teimouri2017), demonstrated in multiple uses of terminology (cf. Dörnyei & Ryan, 2015; Taguchi et al., Reference Taguchi, Magid, Papi, Dörnyei and Ushioda2009), and indicated in highly similar item wordings (see above), must be addressed through appropriate methods. Issues of discriminant validity must be handled through factor analysis and other multivariate latent dimensional scaling procedures (Devellis & Thorpe, Reference DeVellis and Thorpe2021; Kline, Reference Kline2023).
In the long term, dominant research paradigms are chosen through a complex combination of both evidence and political will (Kuhn, 1962/Reference Kuhn2012). Despite widespread political acceptance, the evidentiary aspect has clearly been lacking in the L2MSS. Whether it is in SDT, SCT, MDL, L2MSS, or a hybridized version of any or all of the above, all these models and theories require a preponderance of evidence for sustainable acceptance. The crisitunity offers researchers the chance to build the coming paradigm of open science and model competition.
It should be noted that this research paradigm is not alone in calls for a substantive pause for reasons of psychometric inadequacy. Given the state of the evidence in recent years, researchers have noted the need for reevaluation of several very popular theories and models, including self-regulated learning (Dinsmore & Fryer, Reference Dinsmore and Fryer2023) and even basic psychological need satisfaction and frustration as it is used in SDT (Murphy et al., Reference Murphy, Watts, Baker, Don, Jolink and Algoe2023). Reassessments and refinements are a part of science.
The process of science often requires the reestablishment of fundamentals that overturn the work of large figures in the field. William Thomson, known more famously as Lord Kelvin, originator of the Kelvin measurement scale and giant of nineteenth-century science wielded a famously outsized influence on science through his role in the Royal Society (Thompson, Reference Thompson1910). Despite being publicly wrong about topics as varied as the age of the Earth and Darwinian evolution (Badash, Reference Badash1989), the existence of X-rays, and the possibility of heavier–than–air flight (Thompson, Reference Thompson1910), his legacy remains even today. Between 1870 and 1890, his vortex theory of atomic structure (Thomson, Reference Thomson1867) was the dominant paradigm in chemistry and atomic physics but was ultimately overturned using new instrumentation, opening opportunities for later researchers such as Rutherford, Bohr, Röntgen, and Einstein to develop the physical and chemical sciences that have built the modern world. Improved measurement resolved the issues: empirical demonstration and improved theory were synthesized to become the politically accepted reality. Each well-measured but theoretically puzzling result led to new pathways for inquiry that need not be beholden to demonstrably outmoded methods and theories; the psychology of language learning is faced now with a similar shift.
For researchers with projects underway, this may mean a need to pivot. At the same time, if these researchers are able to make these same scales work using up-to-date psychometric approaches, their L2MSS projects are likely to reap rewards. For others, the need for fundamental restructuring will move them toward theories with improved parsimony. Until the crisis is resolved, any substantive work on the nature of the L2 Motivational Self System is just as likely to be wrong, and thus overturned by competing models. Taking advantage of this crisitunity will allow researchers to make their mark and carve out new theoretical and methodological directions in the coming research paradigm.
Data availability statement
The datasets generated during and/or analyzed during the current study are not publicly available due to privacy concerns but are available from the corresponding author upon reasonable request.
Acknowledgment
This work was supported by a Japan Society for the Promotion of Science (JSPS) Grants-in-Aid for Scientific Research (23H00647).
Competing interest
The author declares none.