Introduction
Citation analysis, which often involves counting the number of times an article or a researcher has been cited, has been increasingly adopted as a proxy measure of scientific merit (Moed, Reference Moed2005; van Raan, Reference Van Raan, Glänzel, Moed, Schmoch and Thelwall2019). Using citations as performance indicators can trace its theoretical roots to the normative view which holds that citations are made to credit scientific contributions, reflecting the intellectual footprints of publications (Aksnes et al., Reference Aksnes, Langfeldt and Wouters2019). Such a position has been further empirically supported, with studies demonstrating a strong correlation between citation counts and other quality measures such as peers’ qualitative assessment (e.g., Bornmann & Leydesdorff, Reference Bornmann and Leydesdorff2015; Thelwall et al., Reference Thelwall, Kousha, Stuart, Makita, Abdoli, Wilson and Levitt2023). Nevertheless, applying social constructivist theory introduces a different perspective, wherein citing is a social process that engages with “struggles, rhetorics, tactical and strategic games” (Aksnes, Reference Aksnes2005, p.14). The theory brings to light the external attributes rather than the inner quality of the cited article, as many researchers would essentially employ citations as a tool to fulfill their needs, i.e., supporting their claims and persuading their readers. Citers’ motives are intricately tied to their perceptions, which can differ from one person to another, thereby adding more complexities to the overall picture of citation practices (Tahamtan & Bornmann, Reference Tahamtan and Bornmann2019). Given that the normative structure of science can reveal only part of the citation dynamics, it is worth investigating why some articles get more cited than others from multiple perspectives. As early as Reference Garfield1972, Garfield suggested that citation frequency is a function of many influencing factors besides scientific merit. Subsequently, an extensive literature has discussed citation counts’ worth to research quality while examining their potential drivers across disciplines (see Kousha & Thelwall, Reference Kousha and Thelwall2024; Tahamtan et al., Reference Tahamtan, Safipour Afshar and Ahamdzadeh2016). Despite the facility in their computation and application, citations have nevertheless been critiqued for being, inter alia, discipline-specific and influenced by numerous extrinsic factors (Mammola et al., Reference Mammola, Piano, Doretto, Caprio and Chamberlain2022).
Bibliometrics has gained traction as an umbrella term in information sciences, the application of which employs quantitative analysis of bibliographic data to document the characteristics and trajectories of a particular field or discipline (Aryadoust et al., Reference Aryadoust, Zakaria, Lim and Chen2020; Zakaria & Aryadoust, Reference Zakaria and Aryadoust2023). This methodological approach helps synthesize research in a systematic and objective way. It also expands the scope of secondary studies by analyzing metadata from publications and offers more precise methods for evaluating and mapping research (Chong & Plonsky, Reference Chong and Plonsky2023). There is a growing body of bibliometric studies in applied linguistics (Plonsky, Reference Plonsky2023), but they primarily focus on revealing the field’s research foci or mapping its developmental patterns (e.g., Dong et al., Reference Dong, Gan, Zheng and Yang2022; Lei & Liu, Reference Lei and Liu2019; Meihami & Esfandiari, Reference Meihami and Esfandiari2024; Zhang, Reference Zhang2020). A conspicuous gap emerges as the researchers overlook the role of highly cited articles in shaping the overall bibliometric landscape, as well as the factors contributing to high citations. The truth is that the distribution of citations is extremely skewed, with the majority of academic articles never or rarely cited in the following studies and only a small core of them frequently cited (Aksnes et al., Reference Aksnes, Langfeldt and Wouters2019). There is also a pervasive belief within the scientific community that highly cited articles are indicative of research excellence (Bornmann, Reference Bornmann2014). However, it remains important to validate this assumption, to explore whether it holds within the field of applied linguistics or whether it is compromised by extrinsic factors that may not be necessarily related to research quality. Such an endeavor would involve conducting research in applied linguistics by applying bibliometric methods, considering broader epistemological and social perspectives (Giri, Reference Giri, Sett and Sahu2022), and addressing the challenges of different disciplinary contexts (Hyland & Jiang, Reference Hyland and Jiang2019).
Bibliometrics in applied linguistics
While bibliometrics remains relatively new in applied linguistics, substantial progress has been made over the decade (Plonsky, Reference Plonsky2023). Previous bibliometric studies have mainly concentrated on exploring the subfields of applied linguistics. For example, Zhang (Reference Zhang2020) traced the state of the art in second language acquisition covering a period of 20 years. Through cocitation and keyword analysis, Zhang found multiple authors who had significantly contributed to the targeted domain. In another study, Demir and Kartal (Reference Demir and Kartal2022) explored the knowledge base and dynamics in second language pronunciation research, identifying main journals, prominent authors, influential articles, and prevailing subthemes. Individual journals in applied linguistics have also been a subject of bibliometric scrutiny, yielding a detailed portrait of the journal itself in terms of its productivity, trends, impact, and quality. For example, the collective body of works in Language Testing was analyzed by Dong et al. (Reference Dong, Gan, Zheng and Yang2022) to present the research trends in addressing language testing-related issues. Validity was shown to be the hottest topic of all time, but there has also been a shift in research interest toward regional and international testing. The aforementioned studies, despite their focus on different topics and subject matters, share the same root in using bibliometric information to uncover hidden trends, directions, and relationships within subfields of applied linguistics. As their findings accumulate, the broader discipline would also benefit from the synthesized wisdom.
Besides subfield exploration, bibliometrics has been carried out by mapping the entire discipline. De Bot (Reference De Bot2015) offered a historical overview of applied linguistics research over the past 30 years by surveying academics’ views on the definition of applied linguistics, the acknowledgment of influential figures, the evolution of overarching trends, and the impact of academic research on language education. Cross-disciplinary features were also identified, which increasingly incorporate hard sciences to expand the breadth of knowledge, and, in turn, exemplify the complex nature of real-world language-related challenges. In line with de Bot (Reference De Bot2015), Lei and Liu (Reference Lei and Liu2019) also observed a noteworthy shift in applied linguistics towards interdisciplinarity, as its scope now integrates both theories and practice from other research areas. While the ranked lists of publications and authors based on citation frequency were made, Lei and Liu’s (Reference Lei and Liu2019) study fell short of providing a comprehensive view of their interconnections and factors contributing to the trends observed. Zakaria and Aryadoust (Reference Zakaria and Aryadoust2023) sought to fill this gap through the application of the scientometrics method. Specifically, through document cocitation analysis, their study visualized several research clusters in applied linguistics (1970–2022), among which a strong interconnectedness in theoretical bases was revealed. However, the factors driving the trends in highly cited papers were not explored.
Bibliometrics has proved promising in applied linguistics, as evidenced by the 2023 special issue in Studies in Second Language Teaching and Learning edited by Luke Plonsky, and the 2024 Springer handbook edited by Rajab Esfandiari and Hossein Meihami. Offering a bird’s eye view of the trends within applied linguistics and its subtopics, the aforementioned studies tend to be descriptive rather than evaluative (van Leeuwen, Reference Van Leeuwen, Moed, Glänzel and Schmoch2005). There is room to broaden the scope of research by examining the factors that influence research impact within the field. Specifically, citation counts have been widely regarded as the academic currency in this context, but there has been a noticeable gap in addressing the underlying factors that influence citation counts and establish them as a reliable performance indicator. It is worth mentioning that Al-Hoorie and Vitta (Reference Al-Hoorie and Vitta2019) initiated such an inquiry at the journal level, finding a positive and modest relationship between common citation-based metrics and the statistical quality of second-language journals. Taking a step forward, Xu et al. (Reference Xu, Zhuang, Blair, Kim, Li, Thorson Hernández and Plonsky2023) contended that the perceived prestige/quality of applied linguistics journals could be predicted by a range of extrinsic features, so extra caution was needed when using and interpreting the bibliometric indicators for journal evaluation. However, to our knowledge, no studies in applied linguistics have to date investigated the multiple factors associated with or predicting the citation counts of individual articles. Examining this multiplicity of potential predictors could offer a more in-depth understanding of the relationship between citation counts and their influencing factors, thereby enhancing the field’s theory-building capacity regarding the nature and value of citation counts. Thus, we adopted a cross-disciplinary approach, drawing on frameworks from bibliometric and scientometric research outside the field to identify what contributes to highly cited articles in applied linguistics. This approach allows us to provide a better understanding of how scholarly influence is constructed and perceived in the unique context of applied linguistics.
Factors influencing citation counts
Tahamtan et al. (Reference Tahamtan, Safipour Afshar and Ahamdzadeh2016) carried out a review of 198 articles on citation drivers, the results of which challenged using citation counts as a surrogate for research quality. A total of 28 influencing factors were identified and categorized into three general dimensions: author-related, journal-related, and article-related features, respectively. Those that are statistically significant in prior works while holding relevance to the current study are discussed in greater detail next.
Journal-related factors
Publishing in reputable journals has the potential to achieve high visibility and impact, driven by the perception that they publish content of good quality (Antonakis et al., Reference Antonakis, Bastardoz, Liu and Schriesheim2014). As prestige is invisible and hard to measure, a variety of metrics have been adopted to quantify journal performance, such as Journal Impact Factor, SCImago Journal Rank, CiteScore, and h-index (see Supplemental Material A for a table of key citation metrics). A positive relationship between these indicators and citation counts has also been extensively verified across disciplines (Bornmann & Leydesdorff, Reference Bornmann and Leydesdorff2015). Open access (OA) in journals is another key feature linked with the publication’s visibility and impact (Perianes-Rodríguez & Olmeda-Gómez, Reference Perianes-Rodríguez and Olmeda-Gómez2019). The notion of “open access citation advantage” was initially articulated by Lawrence (Reference Lawrence2001) in computer science literature, where he observed that free access articles were approximately 2.5 times more likely to be cited than articles with restricted access. After this, numerous studies compared the citation rates of OA and non-OA articles, supporting the idea that OA could accelerate the dissemination of research discoveries and lead to an increase in article citations (e.g., Eysenbach, Reference Eysenbach2006; Liskiewicz et al., Reference Liskiewicz, Liskiewicz and Paczesny2021; Tennant, Reference Tennant2022).
Author-related factors
Several author-related factors have been found to influence the citation counts of published studies. First, it has been assumed that multiauthor articles yield greater citations than solo efforts, and a higher number of authors is generally associated with a larger number of citation counts (Franceschet & Costantini, Reference Franceschet and Costantini2010). One plausible explanation is that coauthors contribute diverse areas of expertise, resulting in “boundary-spanning” as they collaborate (Chen, Reference Chen2012, p. 432). International coauthorship can extend the boundary as well, for it may drive citations from all countries represented in the research team (Sud & Thelwall, Reference Sud and Thelwall2016). In addition, the interplay between citation counts and geographic locations is seen as a factor that contributes to the number of citation counts. West and Mcllwaine (Reference West and McIlwaine2002) took the journal Addiction as an example and observed a significant citation disadvantage in articles from developing countries. A similar pattern emerged in King’s (Reference King2004) study, where the researcher noticed the geographical disparities in scientific merit and research funding after comparing the citation shares among 31 countries. Varying degrees of funding may also have an impact on citing behaviors, as research with financial support might undergo more peer review, thereby increasing the likelihood of enhanced research quality and impact (Rigby, Reference Rigby2013). Another factor is the authors’ research performance, either assessed individually or collectively, which is revealed by many to be a noteworthy indicator for future citations (e.g., Grover et al., Reference Grover, Raman and Stubblefield2014; Wang et al., Reference Wang, Fan, Zeng and Di2019). It is reasonable to argue that authors with good academic records often attract more scholarly attention, due to their recognition as authorities within their respective domains (Bornmann et al., Reference Bornmann, Schier, Marx and Daniel2012).
Article-related factors
Among the article-related factors, the characteristics of titles discussed by Paiva et al. (Reference Paiva, Lima and Paiva2012) are viewed as one of significant predictors of citation impact in biomedical studies. It might be surprising for something as simple as a punctuation mark like a colon to negatively impact citations, but an argument was made that titles with colons tend to be longer and less tempting. In addition, the number and recency of references were noted to have positive impacts on citations in social–personality psychology research. As explained by Haslam et al. (Reference Haslam, Ban, Kaufmann, Loughnan, Peters, Whelan and Wilson2008), the former suggests an author’s greater familiarity with the research topic, while the latter mirrors both the vibrancy and currency of the field of study. For content-based factors, subfield-specific differences in citation practices have attracted considerable attention (Glänzel et al., Reference Glänzel, Thijs, Schubert and Debackere2009). The probability of an article receiving citations is substantially linked to the volume of all articles published in a particular field, and publications from the lesser-explored areas would generally receive lower counts of citations compared to those from the main trend (Bornmann et al., Reference Bornmann, Schier, Marx and Daniel2012). Furthermore, the connection between methodological orientations and citation rates has been underscored. Specifically, articles offering systematic reviews or meta-analyses tend to draw more citations than those presenting original findings (Amini Farsani et al., Reference Amini Farsani, Jamali, Beikmohammadi, Ghorbani and Soleimani2021), and methodological studies would become frequently cited because of the introduction of novel scientific tools (Padial et al., Reference Padial, Nabout, Siqueira, Bini and Diniz-Filho2010).
The present study
The existing body of bibliometric literature has extensively discussed the worth of citation counts to research quality and their influencing factors across diverse disciplines. However, this area of inquiry remains scarcely unexplored within the scope of applied linguistics. Considering the prominent yet controversial role that citations play in research evaluation, we took the initiative to align factors predicting the citation counts of top-performing articles in the applied linguistics discipline by examining the following two research questions:
RQ 1. What are the bibliometric features of highly cited papers in applied linguistics published in the past 22 years?
RQ 2. What extrinsic factors influence the citation counts of these highly cited papers in applied linguistics?
Method
The study drew inspiration from the research synthesis approach (Norris & Ortega, Reference Norris and Ortega2006; Plonsky & Oswald, Reference Plonsky, Oswald and Plonsky2015), and adapted it for the bibliometric analysis reported here. The design comprises four major steps, which are discussed next: scope definition, data collection, coding procedures, and data analysis.
Scope of the study
Numerous studies have focused on defining highly cited articles, employing two commonly accepted approaches: absolute thresholds and relative thresholds (Aksnes, Reference Aksnes2003). Notably, Bornmann (Reference Bornmann2014) identified five fundamental approaches to normalizing citation impact values, including the absolute measures using “number of top papers,” “number of citations,” and “number of cocitations,” as well as the relative values attached to “percentile rank class” and “distance from mean.” Among them, the percentile-based bibliometric indicator was adopted by more than half of the literature (66.67%), with the “top 1%” being the most frequently utilized criterion as the vanguard for scientific knowledge.
It is important to note that different fields of research may have varying standards for determining highly cited articles, so it is crucial to consider both the context and scope of the studies (Aksnes, Reference Aksnes2003). As there is no generally accepted formula for scientific excellence in applied linguistics, we adopted the prevailing practice in the academic community. By adopting the threshold of the top 1%, we considered the relative position of a publication within the citation distribution of its specific field (Waltman & Schreiber, Reference Waltman and Schreiber2013). We thereby worked with an expected value of the dataset and were able to track the most influential articles that had garnered widespread recognition and attention in applied linguistics.
Data collection
To ensure a good representation of prominent applied linguistics publications over time, three data collection steps were undertaken under Donthu et al.’s (Reference Donthu, Kumar, Mukherjee, Pandey and Lim2021) proposed guidelines: journal selection, database selection, and data extraction.
Given that the study emphasizes the highly cited articles that have played an important role in shaping and advancing the dedicated field, we selected the journals that would potentially represent mainstream research. Following Zakaria and Aryadoust’s (Reference Zakaria and Aryadoust2023) finding of Quartile-1(Q1) applied linguistics-related journals, we restricted the list of candidates using SCImago Journal Rank (SJR) to “Social Sciences,” “Linguistics and Language,” “All regions/countries,” “Journals,” and “2020.” We further screened the titles of the candidate journals and kept those that are closely associated with the field of applied linguistics, and specifically center on language use, learning, teaching, and assessment (Lei & Liu, Reference Lei and Liu2019; Zakaria & Aryadoust, Reference Zakaria and Aryadoust2023). Overall, a list of 55 top-tier journals was employed (see Supplemental Material B). For online databases, Scopus and Web of Science stand out in current bibliometric studies. However, Scopus is generally considered to have wider coverage, especially journal coverage, as demonstrated by previous research (e.g., Aksnes et al., Reference Aksnes, Langfeldt and Wouters2019; In’nami & Koizumi, Reference In’nami and Koizumi2010). To verify this stand in the current context, two separate searches were carried out on the Web of Science and Scopus, using the terms “All Fields,” “applied AND linguistics,” and the publication year from “2000 to 2022.” Comparatively, the results were 19,008 records from the former and 219,730 records from the latter (as of 8 March 2023). Scopus was thus chosen as our database to enhance the coverage and quality of data collection. Subsequently, we performed an advanced search for target articles in Scopus. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) protocol is presented in Figure 1. While filtering all the English publications specifically in the applied linguistics domain within the 55 Q1 journals, the literature search yielded a total of 33,031 distinct records published between 2000 and 2022 (as of 8 March 2023). Scopus holds records of 12 document types. Since we aimed at scientific papers, only articles, reviews, conference papers, and book chapters were included (Garousi & Fernandes, Reference Garousi and Fernandes2016). Consequently, the records were limited to 30,252 articles. They were further ranked in descending order based on citation counts, and the final number of the top 1% of most cited articles totaled up to 302. The complete query string is in Supplemental Material C.
Each article indexed in Scopus comes with various information, including citation information, bibliographical information, abstract and keywords, etc. For the purposes of this study, data directly extracted from the database include citation counts, publication year, source venue, author(s), affiliation(s), document title, access form, and funding details. We further conducted manual searches to obtain information about the uncovered factors. Specifically, the author’s h-index and geographical origin were examined based on the particular information of the first author. The allocation of credit for multiauthor works has been a longstanding concern, but the first author is typically seen as playing a major role or contributing at least equally as compared with the other authors (Shen & Barabási, Reference Shen and Barabási2014). In addition, by focusing on a single author, we could control for the author-related factors that might otherwise complicate the analysis (Grover et al., Reference Grover, Raman and Stubblefield2014). There was another concern associated with our collection of the h-index data (as of the time of writing, March 2023). Although a researcher’s h-index may increase as the citation counts of a single article increase, this effect is generally minimal (Grover et al., Reference Grover, Raman and Stubblefield2014). Due to the way the h-index is calculated, this increase is capped at 1, regardless of how many additional citations the article receives, even if those numbers are substantial. Lastly, it is important to note that the information available in Scopus may not always be reliable or accurate, so we downloaded each of the targeted articles, and read their features to cross-validate the information that was recorded in Scopus.
Coding procedures
Since the construct of interest in this study is the presumed merit of an individual article as assessed by citation practices, citation counts were initially considered as the dependent variable. However, older articles are likely to accumulate more citations simply because they have been available longer, providing researchers with more opportunities to cite them (Aksnes et al., Reference Aksnes, Langfeldt and Wouters2019). To overcome this bias, we used “citations per year” to normalize citations by the number of years since publication, a practice supported by extensive research (e.g., Antonakis et al., Reference Antonakis, Bastardoz, Liu and Schriesheim2014; Garousi & Fernandes, Reference Garousi and Fernandes2016; Liskiewicz et al., Reference Liskiewicz, Liskiewicz and Paczesny2021). Although the distribution of citations over time is often not linear in general, the aging process of Social Sciences and Humanities research, especially highly cited articles, tends to be significantly slower, compared to hihghly cited articles in Natural Sciences (Aksnes, Reference Aksnes2003; Giri, Reference Giri, Sett and Sahu2022). For this reason, the current study took a long citation window (2000–2022), and it was observed that the majority of the target articles are rather aged, with 93.7% (n = 283) having over 10 years of citation history. They would receive a burst of citations initially but reach a steady state over time, thereby justifying “citations per year” to normalize the performance of highly cited applied linguistics articles.
The selection of independent variables meets three primary requirements: (a) they have been identified as predictors of citation counts in prior empirical studies, (b) they are not directly tied to scientific merit, and (c) they can be quantified and measured for further statistical analysis (Tahamtan et al., Reference Tahamtan, Safipour Afshar and Ahamdzadeh2016). As such, a total of 11 independent variables were considered, covering two journal-related, five author-related, and four article-related factors. Table 1 illustrates the coding scheme for these variables. The detailed coding manual on applied linguistics specialties is provided in Supplemental Material D.
Intercoder reliability was further adopted to assess the consistency and accuracy of the coding results. The first round of coding involved processing the entire dataset. A second coder, with an educational background in applied linguistics, was invited to perform the task independently and label 60 randomly selected articles (20% of the whole dataset). The overall agreement rate reached 97.22%. Discrepancies in results were ultimately addressed through coders’ discussions and clarifications regarding the criteria for coding assignments.
Data analysis
In their discussion of regression-based methods for citations, Thelwall and Wilson (Reference Thelwall and Wilson2014) suggested regression models are highly advantageous, as they enable the simultaneous examination of the effects of numerous variables. Therefore, we employed multiple linear regression analysis using STATA Version 15 to identify the potential drivers of citation counts in applied linguistics literature (RQ2).
Prior to the regression analysis, we conducted a descriptive analysis to explore the highly cited applied linguistics articles, in terms of the chronological distributions of their citations and the statistical properties of their journal-related, author-related, and article-related features (RQ1). Spearman’s correlation analysis was subsequently performed, which revealed bivariate correlations between variables, as a way to assess the potential presence of multicollinearity (Field, Reference Field2018).
The distribution of the dependent variable was observed to be skewed, thereby having a higher chance of violating the assumptions of general linear models (Thelwall & Wilson, Reference Thelwall and Wilson2014). To address this concern, we applied a logarithmic transformation of the dependent variable in regression analysis, as recommended by extensive evidence (e.g., Onodera & Yoshikane, Reference Onodera and Yoshikane2015; Rigby, Reference Rigby2013; Thelwall & Wilson, Reference Thelwall and Wilson2014). Assumptions including linearity, homoscedasticity, normality, and absence of multicollinearity were examined to ensure the goodness of fit for linear regression analysis. The detailed diagnostic report is available in Supplemental Material E, demonstrating that all the assumptions have been satisfied.
The concept of regression outliers was operationalized by looking at the standardized residuals, with four cases out of the range between –3 and +3 (Case 5, 6, 8, and 162). However, there was no influential case, as examined by Cook’s Distance (<1) and the leverage value (<0.5). Meanwhile, the removal of outliers, as proved by Osborne and Overbay (Reference Osborne and Overbay2004), can reduce the likelihood of Type I and Type II errors, thereby enhancing the overall accuracy of estimations. As such, after the exclusion of four outliers based on their unusual values, multiple linear regression was run again to obtain the model’s parameter estimates.
Results
Chronological analysis
Figure 2 shows a scatter plot of the targets’ citation counts against publication years. There are 302 points in total, each representing a studied article. The majority of the literature tends to be concentrated on the left side of the timeline, indicating that they are from earlier years. It is worth noting that several recent research articles (such as Cases 5, 6, and 8) have also received substantial citation counts, overlapping the previously mentioned outliers and adding to the need for further investigation into their significance. Figure 3 illustrates the sum of lifetime citations for highly cited research published each year. Notably, those with a longer duration of circulation generally exhibit a higher total of citation counts, with articles published in 2006 receiving the highest number of citations, reaching a total of 11,994. Following closely behind are the publications from 2000 (10,850), and 2003 (9,786), while there was a sharp drop in citation counts in 2012, from 5,424 in 2011 to 2,098, and the counts have remained persistently low since then.
Statistical properties of variables
Descriptive statistics of the variables are illustrated in Supplemental Material F. In terms of the continuous variables, the corpus of 302 articles was cited 22.36 times on average, with a range between 9.87 and 165 times. Skewness (4.531) and kurtosis coefficients (30.441) indicate that the degree of annual citations varies considerably (see Figure 4a). The number of authors also exhibits a departure from normality, as can be observed in Figure 4b (skewness: 4.698; kurtosis: 38.531), with a range from 1 to 15 and a mean of 1.84. The number of references is right-skewed (skewness: 2.097; kurtosis: 5.631), and the range of variation goes from 9 to 319 (see Figure 4c). Descriptive statistics of the categorical variables are shown in the form of frequency analysis, visualized in Figure 5. For each subcategory, the total number of citations and the average number of citations per publication were also calculated (see Table S9 in Supplemental Material F).
Correlation analysis
Based on the correlation matrix as presented in Supplemental Material G, we found that the dependent variable is positively correlated with CiteScore (r=.134, p<.05), open access (r=.159, p<.01), h-index (r=.223, p<.01), number of references (r=.140, p<.05), subfield in language contact (r=.153, p<.01), and method using research synthesis (r=.157, p<.01). By contrast, there is a significant negative association between mixed-methods orientation (r=–.155, p<.01) with annual citations. Regarding the pairs of independent variables, the count of references is positively associated with the method using research synthesis (r=.428, p<.01), and the number of authors has a positive correlation with international coauthorship (r=.363, p<.01). However, the correlation coefficients outlined previously fall within a weak (0.1 to 0.3) to moderate (0.31 to 0.7) range, indicating that there is no multicollinearity in the data.
Hierarchical linear regression
We estimated the linear regression models hierarchically. All variables were first entered into a single model, and then article-related factors were excluded, followed by the removal of author-related factors. This allowed us to determine the optimal fitting model while examining the incremental contribution of each set of factors to the model. As demonstrated in Table 2, the first model has the strongest explanatory power among the three models fitted, with 20.8% of the observed variance of annual citations explained by the predictor variables: F(21, 276)=3.454, p<.05, R²=.208, adjusted R²=.148. See Supplemental Material H for the full regression estimates.
Note:
a all predictors;
b article-related factors removed;
c author-related factors removed.
By examining the p-values for each unstandardized regression parameter (b-coefficients) in Model 1, we identified five continuous independent variables that significantly predicted the citation counts. Specifically, the number of authors (b=.068, t= 3.624, p<.01), the CiteScore (b=.025, t=2.250, p<.05), and the author’s h-index (b=.003, t=2.041, p<.05) are positive predictors of citations per year, whereas the title’s length (b=–.002, t=–2.272, p<.05) had a negative impact on the likelihood of an article achieving high citations. Among categorical independent variables, the difference between articles with open access and with restricted access is statistically significant (b =.130, t= 2.158, p<.05), demonstrating that articles with open access were cited an average of 13% more than articles with restricted access. When compared to the baseline category “others” of the subfield, the slope for language use (b=.385, p<.05) and language contact (b=.442, p<.01) is significant. In particular, the b-coefficients suggest that articles on language use and language contact were cited more than articles labeled as “others” by an average of 38.5% and 44.2 %, respectively. Additionally, articles with mixed methods (b=–.189, p<.05) were significantly cited an average of 18.9% less than articles that used research synthesis.
Standardized regression and sheaf coefficients
Given that the regression estimates only revealed the individual impact of dummy variables as compared to the reference group, we calculated sheaf coefficients using STATA to understand the combined explanatory effect of the dummies on the dependent variable (Buis, Reference Buis2009).
Table 3 presents the standardized coefficients for both sheaf and nonsheaf predictors. There were seven statistically significant predictors of highly cited articles’ annual citations in total: number of authors, subfield, methodology, title length, CiteScore, accessibility, and scholar h-index. While comparing their standardized coefficients, we identified that author counts exhibited the strongest predictive power (β=.219, p<.01). This is closely followed by the set of subfield dummies (β=.208, p<.01), with its sheaf coefficient almost twice as much as the predictive power of the author’s h-index (β=.117, p<.05). The overall effect of methodology was statistically significant (β=.160, p<.05). Moreover, journal CiteScore (β=.133, p<.05) and title length (β=–.134, p<.05) exhibit similar magnitudes of explanatory strength but in opposite directions. Finally, the estimate of accessibility (β=.125, p<.05) indicates open-access articles are, on average, cited 0.125 points more than articles that are not open-access.
Note:
a standardized regression coefficients;
b standardized sheaf coefficients;
variables ranked based on the descending order of absolute values of standardized coefficients;
**p < .01, *p < .05
Discussion
Since the main objective of this study was to determine what makes articles highly cited in the applied linguistics field, the findings are discussed factor-by-factor as follows.
Number of authors
Among all the predictor variables, the number of authors was found to have the strongest predictive power of annual citations in the present study. In keeping with previous research (e.g., Didegah & Thelwall, Reference Didegah and Thelwall2013; Franceschet & Costantini, Reference Franceschet and Costantini2010; Padial et al., Reference Padial, Nabout, Siqueira, Bini and Diniz-Filho2010), the positive relationship can be explained by: (a) self-citations: adding an author to an article potentially raises the likelihood of self-citations of the article; and (b) visibility: multiauthor articles are more likely to span across different subject fields, thereby broadening the network through author’s personal contacts and enhancing readership. While these two dimensions of a coauthored article are largely independent of research quality, collaboration itself can be also a promising avenue for scientific merit (Xu et al., Reference Xu, Zhuang, Blair, Kim, Li, Thorson Hernández and Plonsky2023). Applied linguists in different areas of expertise, as Amini Farsani et al. (Reference Amini Farsani, Jamali, Beikmohammadi, Ghorbani and Soleimani2021) suggested, would engage in reciprocal works on L2 multifaceted issues to the extent that their interactions influence or are influenced by each other. When assessing the citation advantage of research collaboration, it is thus important to take a step further by examining whether this shared benefit stems from the publication’s enhanced inner quality or from the extrinsic factors we mentioned. For authorship patterns, our data on the number of authors exhibited a right-skewed distribution. A tendency was accordingly revealed towards individual research or small-scale collaboration in applied linguistics, potentially influenced by factors such as the field’s research nature, cultural norms, resource availability, and specific publication practices (Franceschet & Costantini, Reference Franceschet and Costantini2010). While the importance of collaboration in scientific publishing has been thoroughly discussed (see, for example, Hyland, Reference Hyland2015), investigating the specificity of collaboration patterns remains an intriguing research area in applied linguistics (Amini Farsani & Jamali, Reference Amini Farsani and Jamali2023; Amini Farsani et al., Reference Amini Farsani, Jamali, Beikmohammadi, Ghorbani and Soleimani2021).
Subfield
The overall effect of the subfield factor on citations is significant in our study, which highlights the need to correct for subfield-specific profile heterogeneity in evaluative or comparative bibliometric studies in applied linguistics (Glänzel et al., Reference Glänzel, Thijs, Schubert and Debackere2009). We identified that language contact, although not the largest in shares, tends to exhibit the highest annual citations compared to all other categories. As we went deeper, it became apparent that out of the top ten most cited articles in our dataset, four are under this category, all centering on the issue of translanguaging. Such an area has recently surged in popularity in academia, leaving a profound impact on contemporary educational landscapes (Lin & Lei, Reference Lin and Lei2020). While particularly focusing on the advancement of research methodology, the base category “Others” exerts the least effect on annual citations and has the smallest portion (2.318%). This may be because only in the recent decade did researchers embrace a nascent call for fostering methodological awareness in applied linguistics (Plonsky, Reference Plonsky, Loewen and Sato2017). As indicated by Bornmann et al. (Reference Bornmann, Schier, Marx and Daniel2012), the likelihood of receiving citations is associated with the volume of articles published across fields, so writing from less-explored areas may also garner less visibility, and consequently, fewer citations.
Methodology
The combined effect of the methodology dummy variables was statistically significant while holding other variables constant. This finding resonates with many previous studies (e.g., Amini Farsani et al., Reference Amini Farsani, Jamali, Beikmohammadi, Ghorbani and Soleimani2021; Antonakis et al., Reference Antonakis, Bastardoz, Liu and Schriesheim2014; Grover et al., Reference Grover, Raman and Stubblefield2014), and showcases the significant contribution of methodology choice to future citations. Specifically, research synthesis tends to have higher citation rates than other approaches, largely because it draws from a vast body of prior research, from which a wealth of information can be obtained and processed (Tahamatan et al., Reference Tahamtan, Safipour Afshar and Ahamdzadeh2016). Our study also mirrors the methodological inclinations in applied linguistics, where the quantitative approach (25.17%) keeps its dominant role, and the mixed-methods approach (24.5%) increases its popularity in addressing language-related problems (Riazi & Amini Farsani, Reference Riazi and Farsani2024). The share of research synthesis (18.87%) implies a shift that embraces secondary research (Chong & Plonsky, Reference Chong and Plonsky2023), taking accumulated studies as its source of data while shedding empirical light on future applied linguistics studies. Overall, the observed research patterns better our understanding of the interplay between methodological approaches and citation practices, while also ensuring that we stay informed about the current research trends in applied linguistics.
Title length
A negative correlation was revealed between the characters of titles and the number of citations applied linguistics articles received annually. Attention, appreciation, and visibility turn out to be the fundamental mechanisms in driving the citations and the overall influence of a given publication (Nair & Gibbert, Reference Nair and Gibbert2016). As explained by Paiva et al. (Reference Paiva, Lima and Paiva2012), articles with brief titles may capture more attention from readers compared with those with lengthier titles. A short title is likely to stand out during users’ quick scans of search results, while a longer one might be perceived as confusing or dull. In addition, the ease of reading and understanding shorter article titles may contribute to increased engagement, as readers are more likely to proceed to the abstract or the full writing (Haslam et al., Reference Haslam, Ban, Kaufmann, Loughnan, Peters, Whelan and Wilson2008). As such, opting for brevity broadens readership, enhances visibility, and ultimately boosts the chances of receiving citations. Moreover, it is important to note that length is merely one of the numerous title-related characteristics that may influence citations, and more structural and content-related elements need to be considered in the long run (Nair & Gibbert, Reference Nair and Gibbert2016).
CiteScore
The positive association between journal CiteScore, which is sometimes perceived as an index of journal “prestige,” and article citations is verified in the current research. This finding comes as no surprise since CiteScore is calculated on an annual basis, reflecting the average number of citations received by all items published in a certain Scopus-indexed journal (Roldan-Valadez et al., Reference Roldan-Valadez, Salazar-Ruiz, Ibarra-Contreras and Rios2019). In this regard, this metric is mathematically associated with article citations. While we note that a single bibliometric indicator is insufficient to evaluate the multifaceted dimensions contributing to the merits of L2 journals (Al-Hoorie & Vitta, Reference Al-Hoorie and Vitta2019), studies have shown the relative accuracy of CiteScore in quantifying a journal’s citation impact (e.g., Croft & Sack, Reference Croft and Sack2022; Roldan-Valadez et al., Reference Roldan-Valadez, Salazar-Ruiz, Ibarra-Contreras and Rios2019). A high CiteScore indicates that the journal has achieved success in terms of its articles being cited, and will continue to draw in more citations. The “halo effect” plays a crucial role here, indicating the tendency for the overall journal evaluation to be affected by first impressions and prior performance (Liao, Reference Liao2021). Journals with good citation performance are favored by many scholars when citing others’ works as well as publishing their own, as the outlets are likely perceived to garner larger reading audiences and higher-quality papers in a particular field. It becomes apparent that the impact of a journal significantly influences its citation rates and the decisions of authors regarding where to submit their work (Didegah & Thelwall, Reference Didegah and Thelwall2013).
Accessibility
We discovered a meaningful and modest relationship between open access (OA) and subsequent citations. On the one hand, this citation advantage of OA resonates with dozens of prior studies (see Tennant, Reference Tennant2022), and validates its potential in accelerating the recognition and dissemination of scholarly outputs (Eysenbach, Reference Eysenbach2006). As Perianes-Rodríguez and Olmeda-Gómez (Reference Perianes-Rodríguez and Olmeda-Gómez2019) argued, more access enables higher “visibility, retrievability, audience, usage, earlier discussions, verifications, and collaborations” (p.11), and consequently, better chances for citations. On the other hand, establishing causality for a limited relationship is hard, as there are different academic cultures and various possible confounders to consider. One possible explanation could be that the impact of OA on Q1 journals is less pronounced as scholars tend to cite these prominent journals within their fields regardless of their accessibility policies (Li et al., Reference Li, Wu, Yan and Li2018). Regarding the OA prevalence, we observed a limited percentage of highly cited articles (18.54%) that are freely accessible. A growing consensus has been reached in applied linguistics on the importance of Open Science (Liu et al., Reference Liu, Chong, Marsden, McManus, Morgan‐Short, Al-Hoorie, Plonsky, Bolibaugh, Hiver, Winke, Huensch and Hui2022), the umbrella term that encompasses open access. Despite the OA citation advantage, therefore, there is still a lack of incentives for researchers to make their endeavors publicly available, making it imperative to further explore the underlying reasons.
Scholar h-index
An author’s research performance, measured by the h-index, was proved to significantly predict article citations. The Matthew Effect (Merton, Reference Merton1968), that is, the cumulative advantage in science, plays an important role here, suggesting publications of equal intrinsic quality will be cited differently depending on their authors’ eminence (Aksnes et al., Reference Aksnes, Langfeldt and Wouters2019). Similar to the halo effect of high-impact journals, researchers who possess certain advantages, such as high productivity and impact in the past, tend to attract more attention and resources, as well as citations in their subsequent work (Haslam et al., Reference Haslam, Ban, Kaufmann, Loughnan, Peters, Whelan and Wilson2008). It may also lead to a more skewed citation distribution, with the less established individuals getting underestimation and insufficient credit for their scientific accomplishments. This prompts us to reflect on the validity of citation counts to assess applied linguistics practices because ideally, recognition should be awarded based on the works’ quality regardless of authors’ extrinsic influences (Mammola et al., Reference Mammola, Piano, Doretto, Caprio and Chamberlain2022). Since we only focused on the first author factor, it may not sufficiently explain the overall authorship pattern of the target articles. There could also be a notable impact from the potential leaders (for example, corresponding authors), as well as the collective wisdom in certain disciplines (Wang et al., Reference Wang, Fan, Zeng and Di2019). Therefore, it is crucial to explore further the performance of various contributors involved in the matter.
Number of references
While the number of references was positively correlated with the citation rate of highly cited articles, their association lost statistical significance when other factors were controlled for. This finding is unexpected, given the prevailing notion that a higher number of references tends to attract more citations, supported by earlier studies (e.g., Bornmann et al., Reference Bornmann, Schier, Marx and Daniel2012; Haslam et al., Reference Haslam, Ban, Kaufmann, Loughnan, Peters, Whelan and Wilson2008; Onodera & Yoshikane, Reference Onodera and Yoshikane2015). However, we considered the number of references more as a single structural attribute, regardless of whether all references together are intricately integrated into a coherent synthesis for scientific worth (Grover et al., Reference Grover, Raman and Stubblefield2014). It is probably not reasonable to assume that simply adding the number of references will increase its citations, and in that regard, the quality of references probably matters more. Notably, research synthesis was found moderately correlated with reference counts and getting higher annual citations per article as compared with other methodological orientations. After excluding the methodology variable in the regression model, we observed that the references variable regained its statistical significance. In other words, there is a potential indirect relationship between the number of references and the citation outcomes, mediated by the methodology variable.
Funding
The current study demonstrated that funding was not a significant determinant of the increased citations. This may be due to the disciplinary differences in the importance of funding. Receiving funds is more crucial to expensive experiment-based research projects in certain fields, as it ensures access to the necessary equipment (Jowkar et al., Reference Jowkar, Didegah and Gazni2011). The inherent variance in the characteristics of funding sources, intensity, and variety would also muddy the waters (Rigby, Reference Rigby2013). This requires us to initiate a more thorough examination of the identities of funding bodies from which varying levels of support could be offered. We also found a relatively small proportion of highly cited studies (27.15%) as funded. Heightened competition for financial support may be accountable for this, which has led to a surge in proposal submissions and a decrease in success rates for securing grants (Roebber & Schultz, Reference Roebber and Schultz2011). Additionally, factors like efficient resource allocation, pressure to deliver specific outcomes, alignment with funding objectives, and the pursuit of novelty can impact the effectiveness and reach of funded research. We shall delve deeper into understanding the nuanced relationship between funding and citation rates, particularly examining how different funding models and management practices influence research impact.
Internationality
While international collaboration is generally known to have a citation advantage across fields (Sud & Thelwall, Reference Sud and Thelwall2016), it did not emerge as a statistically significant predictor in our study. This observation is, however, in alignment with Persson’s (Reference Persson2010) research, which suggests that the mixed effects of international coauthorship on research impact are not solely due to the inherent nature of cooperation itself, but also external confounding subfactors. For example, conclusions regarding the significance of global collaborative efforts might inadvertently oversimplify if we view all countries as having equal weights in research contributions (Sud & Thelwall, Reference Sud and Thelwall2016). Our dataset was observed to be more locally oriented, with only 18.21% having coauthors from different countries. This might be explained by the fact that half of the dataset is solely dominated by first authors from the United States and Canada. We also found a significant negative correlation between researchers from these two countries and international cooperation, implying they prefer conducting research individually or collectively at the national level. As such, country-specific features play a role in interpreting the aforementioned results (Salager-Meyer, Reference Salager-Meyer2008).
Geographical origin
No statistically significant association between geographical divisions and citation counts was revealed in our analysis. This casts doubts on the conclusions drawn from prior studies (e.g., King, Reference King2004; Nielsen & Anderson, Reference Nielsen and Andersen2021; West & Mcllwaine, Reference West and McIlwaine2002), indicating that citation impact varies across national boundaries. However, our research did not measure the location of authors on a country-by-country basis; instead, it was organized into continental groupings. The interpretation of citation impact could become ambiguous when different units and levels of analysis are employed (Persson, Reference Persson2010). A more sophisticated citation landscape can be unfolded at the author level by considering the diverse components closely tied to authors’ geographical differences. This entails examining the privileges of authors from certain countries or regions, in terms of research funding, facilities, prestige, collaboration, database coverage, and other elements that contribute to the intricacies of publication and citation practices (Nielsen & Anderson, Reference Nielsen and Andersen2021; Salager-Meyer, Reference Salager-Meyer2008). As for the global distribution of highly cited articles, the American continent ranks first in shares (51.66%), signifying its prominent role in academic publishing (Lei & Liu, Reference Lei and Liu2019; Xu et al., Reference Xu, Zhuang, Blair, Kim, Li, Thorson Hernández and Plonsky2023). Europe occupies the second position (26.49%), followed by countries from Asia (12.58%), and Oceania (9.27%) with notably smaller percentages. As such, the finding underscores the stark disparities across regions, as well as the uneven distribution of scientific outcomes in the global context.
Conclusion
This study unpacked the dynamics underlying citation practices in applied linguistics research. Collectively, all our factors predicted 20.8% of the variance (R²=.208, p<.05) in citation counts. This aligns with the finding in Mammola et al.’s (Reference Mammola, Piano, Doretto, Caprio and Chamberlain2022) meta-analysis, which suggested that the overall predictive power of extrinsic features on citations was relatively small across the scientometric literature. It is an anticipated outcome, given that all the employed explanatory variables were independent of research quality. The remaining variability could be attributable to intrinsic scientific quality, the principal driving force, alongside other uncharted extrinsic features (Onodera & Yoshikane, Reference Onodera and Yoshikane2015). Therefore, we aimed not at building a model with a perfect fit, but rather at exploring the significance of extrinsic factors contributing to the high citations of applied linguistics research.
For those statistically significant variables, a common thread lies in the enhanced visibility arising from the better performance of journals and authors, greater access to articles, wider range of cooperation, shorter length of titles, and larger popularity of the subfield in academia. In other words, the observed citation patterns could be partially interpreted as a reflection of visibility dynamics. These dynamics, grounded in the social constructivist view, are at the core of the recognition and dissemination of research discoveries (Aksnes et al., Reference Aksnes, Langfeldt and Wouters2019). Their significance would also grow over time because of the self-intensifying process, known as the Matthew effect (Aksnes, Reference Aksnes2005). While esteemed authors with their following work keep receiving recognition and citations, new or unknown scholars are still struggling to find their readership (Bornmann et al., Reference Bornmann, Schier, Marx and Daniel2012). As such, these underlying social mechanisms offer us a conceptual and unified explanation for the roles of extrinsic properties in shaping the citation landscape of applied linguistics literature.
It is important to acknowledge that the study has three major limitations. First, the selected factors may not be comprehensive enough considering the complexities underlying citing behaviors (Tahamtan & Bornmann, Reference Tahamtan and Bornmann2019). Limited resources from the databases constrained the scope of variables we could measure and incorporate into the study (In’nami & Koizumi, Reference In’nami and Koizumi2010). It is thus imperative to secure more relevant factors, which allow for a more nuanced examination of the subject, ultimately leading to better-informed findings. Another area for advancement involves exploring the intrinsic attributes associated with research quality and citations. While we have asserted the significance of extrinsic factors, it is important to assess the degree to which citations are correlated with research quality. Since citations can also be made to uncover research limitations, inconsistencies, or flaws that are unrelated to scientific merits, a detailed classification of the types of citations (i.e., positive, neutral, and negative citations) is needed to more accurately interpret the true quality of applied linguistics research (Xu et al., Reference Xu, Ding and Lin2022). Finally, both the depth and breadth of the study need to be augmented. Longitudinal studies could provide insights into the evolving impact of factors on citation patterns over time, offering a more comprehensive overview of their long-term effects. To triangulate the results, it would be fruitful to interview the practitioners to determine whether their citing intentions align with the factors we have found. Beyond these tasks, we believe a more robust understanding of citation practices in the applied linguistics field can be achieved.
In sum, citation counts play an increasingly more significant role in an academic’s life. A survey by Abbott et al. (Reference Abbott, Cyranoski, Jones, Maher, Schiermeier and Van Noorden2010) published in Nature revealed that 70% of respondents believed citation counts were used for making decisions on the tenure and promotion of faculty members. The survey also indicated that the majority of the respondents were either not satisfied at all or not very satisfied with how the quantitative metrics were used. In alignment with previous research, the present study showed that citation counts are subject to several factors, not all of which are strictly scientific. By revealing what was being measured by citation counts, we did not intend to imply that applied linguistic researchers manipulated or made strategic choices for the increased impact of their publications. Instead, we aimed to help them understand how the invisible factors and fundamental social mechanisms in academic publishing and knowledge dissemination may work. It is also hoped that the study could serve as a prompt for policymakers and bibliometricians, facilitating their research evaluation and informed decision-making in the field. While the affordances of bibliometrics and citation analysis in assessing the performance of researchers, publications, and institutions have been increasingly recognized in applied linguistics, the present study extended the boundaries of evaluative bibliometrics towards fairness and scientific rigor. The observed extrinsic factors may suggest that a good start is to reduce the reliance on metrics for better science (Oransky & Marcus, Reference Oransky, Marcus and Abritis2023), although the value of transparency and the objectivity provided by quantitative metrics should not be overlooked (Abbott et al., Reference Abbott, Cyranoski, Jones, Maher, Schiermeier and Van Noorden2010). As no consensus has been made regarding the best approach to assessing publication merit, we further suggest a combination of multiple citation-based indicators with qualitative measures, such as peer reviews and recommendation letters from objective referees. It is much in line with the leading principles in the Leiden Manifesto (Hicks et al., Reference Hicks, Wouters, Waltman, De Rijcke and Rafols2015), which emphasize the scientific use of metrics to ensure the reward only goes to good academic practices.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0272263124000743.
Acknowledgments
Our sincere gratitude goes to the Singapore Ministry of Education (MOE) and the China Scholarship Council (CSC) for supporting the first author’s master’s study at the National Institute of Education, Nanyang Technological University (NIE, NTU). We would like to thank Zhao Huijun for assistance in the second round of coding. We are also grateful to SSLA’s editor, Luke Plonsky, and the anonymous reviewers for their constructive feedback. AI was used to revise some of the sentences for improved clarity.
Competing interest
The authors declare none.