Highlights
-
• Cognitive flexibility was enhanced by short intensive training.
-
• Pre-existing cognitive flexibility predicted enhanced interpreting performance.
-
• Pre-existing executive functions (EFs) showed negative correlations with EF gains.
-
• Early-stage acquisition of interpreting skills relies on domain-general EFs.
1. Introduction
The scholarly evidence for the ‘bilingual advantage’, namely, cognitive benefits induced by bilingualism, remains conflicting. It is possible that specific types of bilingual experiences are associated with different aspects of cognitive enhancement (Paap et al., Reference Paap, Johnson and Sawi2015). For instance, bilinguals who switch between languages more frequently showed higher efficiency in the colour-shape shifting task (Han et al., Reference Han, Li and Filippi2022). If different types and intensities of bilingual processing lead to corresponding cognitive changes, it is worthwhile to explore special cases of bilingual language use, such as oral interpreting. Many studies have suggested the association between interpreting experience and cognitive benefits (see Hervais-Adelman & Babcock, Reference Hervais-Adelman and Babcock2019; Nour et al., Reference Nour, Struys, Woumans, Hollebeke and Stengers2020 for review). The ‘interpreter advantage’ hypothesis was proposed at the same time, which postulates that task-specific cognitive skills induced by interpreting experience can be transferred to domain-general executive abilities (García, Reference García2014).
This branch of research mostly touches upon the area of executive function (EF), examining one or multiple components of it. EF consists of three basic components: inhibitory control, working memory and cognitive flexibility (Diamond, Reference Diamond2013). Inhibitory control refers to the deliberate process of suppressing tendencies or specific responses, such as moving attention away from distractions or restraining oneself from reacting with learnt responses. Another key EF component is working memory, which involves holding and manipulating information that is no longer perceptually available, such as making sense of a spoken language that unfolds over time. Cognitive flexibility is the ability to think along multiple paths or change perspectives, such as forming empathy or solving problems in different ways. As is the case with many other competencies, EF abilities are shaped through experience, manifesting themselves in repetitive practices and in the context of problem-solving (Diamond, Reference Diamond2013).
The interpreting task, which involves real-time and intensive speech conversion, may therefore impose a greater demand on EF compared to other bilingual tasks. In a regular bilingual scenario, an individual remains focused on the current language goal while avoiding cross-language interference. In other cases, individuals are free to employ code-switching strategies to mitigate cognitive load when their interlocutors are equally proficient in both languages (Green & Abutalebi, Reference Green and Abutalebi2013). However, simultaneous interpreting (SI) represents a third mode of bilingual processing, requiring the simultaneous activation of two languages and involving a predetermined overlapping sequence of comprehension in one and production in another. Consequently, there are fewer chances for opportunistic planning than in other bilingual tasks, since inhibition of the non-target language, cross-language conversion and target-language articulation all occur simultaneously within the task. Thus, interpreters are confronted with heightened cognitive management demands, which may lead to cognitive advantages that are distinct from those typically seen in regular bilinguals (García, Reference García2014).
Indeed, it has been found that interpreters consistently outperformed matched bilinguals in tasks measuring cognitive flexibility, such as the Wisconsin Card Sorting Test (WCST) and the task-switching paradigm (Babcock & Vallesi, Reference Babcock and Vallesi2015; Dong & Xie, Reference Dong and Xie2014; Macnamara & Conway, Reference Macnamara and Conway2014, Reference Macnamara and Conway2016; Yudes et al., Reference Yudes, Macizo and Bajo2011). Compared to regular bilinguals, interpreters appear to demonstrate a unique advantage in cognitive flexibility, as the interpreting task requires rapid shifts between both modalities and languages, disengaging from one and engaging the other. In terms of inhibitory control, previous studies have found evidence for a bilingual advantage in paradigms such as the Flanker task and the Stroop task (Bialystok et al., Reference Bialystok, Craik and Luk2008; Costa et al., Reference Costa, Hernández, Costa-Faidella and Sebastián-Gallés2009). In contrast, studies involving trained or trainee interpreters have not consistently shown the interpreter advantage in inhibitory control (Babcock & Vallesi, Reference Babcock and Vallesi2015; Dong & Xie, Reference Dong and Xie2014; Köpke & Nespoulous, Reference Köpke and Nespoulous2006; Morales et al., Reference Morales, Padilla, Gómez-Ariza and Bajo2015; Woumans et al., Reference Woumans, Ceuleers, Van der Linden, Szmalec and Duyck2015; Yudes et al., Reference Yudes, Macizo and Bajo2011). However, in a study with a more refined definition of bilingualism, student interpreters outperformed unbalanced bilinguals, but not balanced bilinguals, in tasks measuring the inhibitory control ability (Woumans et al., Reference Woumans, Ceuleers, Van der Linden, Szmalec and Duyck2015). Another event-related potential study examining students of varying interpreting experience using the Flanker task provided evidence for the effect of interpreting experience on conflict monitoring and interference suppression (Dong & Zhong, Reference Dong and Zhong2017). It was observed that students with more interpreting experience, compared to those with less interpreting experience, exhibited larger N2 and P3 amplitudes, suggesting an advantage in monitoring and inhibition, respectively. A possible interpretation is that inhibitory control subserves bilingual processing, including that of interpreting, but enhanced inhibition in interpreters is subtle and may not be readily detected.
Recent studies have also revealed the close relationship between interpreting and updating in working memory, which involves the ability to replace old information with new information in working memory (Miyake et al., Reference Miyake, Friedman, Emerson, Witzki, Howerter and Wager2000). Morales et al. (Reference Morales, Padilla, Gómez-Ariza and Bajo2015) found that interpreters performed significantly better than control participants in the n-back task. This advantage was also evidenced in the letter memory task (Henrard & Van Daele, Reference Henrard and Van Daele2017). However, research on working memory span has yielded conflicting results. For example, Christoffels et al. (Reference Christoffels, de Groot and Kroll2006) found that interpreters outperformed bilingual students and teachers on complex span tasks such as reading span and speaking span, a finding replicated by Signorelli et al. (Reference Signorelli, Haarmann and Obler2012) and Tzou et al. (Reference Tzou, Eslami, Chen and Vaid2012). Conversely, Liu et al. (Reference Liu, Schallert and Carroll2004) did not find differences between groups in a listening span task, while Köpke and Nespoulous (Reference Köpke and Nespoulous2006) found that novice interpreters showed significantly superior performance in free recall with articulatory suppression compared to expert interpreters, and the novices also outperformed bilingual controls in terms of listening span. As far as the empirical evidence indicates, working memory is a critical component of EF that supports such cognitively taxing bilingual activities as SI, but its enhancement may not be a linear progression throughout the interpreting journey. Factors such as age (Signorelli et al., Reference Signorelli, Haarmann and Obler2012), the developmental stages of interpreting training or experience (Tzou et al., Reference Tzou, Eslami, Chen and Vaid2012), and the nature of the task should be considered before drawing conclusions.
Taken together, the present evidence seems to indicate that the cognitive skills developed by interpreters differ significantly from those engaged in general bilingualism. Unlike regular bilinguals who navigate various interactional contexts, interpreters adhere to a single task schema that delineates the predetermined sequence of language comprehension and production (Dong & Li, Reference Dong and Li2019). It is plausible to attribute the enhancement in cognitive flexibility to the frequent cross-language switches occurring in SI and to acknowledge that successful completion of SI tasks is supported by robust working memory capacity and efficient coordination. Nonetheless, selection bias cannot be ruled out, as those exhibiting superior bilingual abilities are more likely to pursue careers in interpreting, a factor that current studies cannot entirely eliminate (García, Reference García2014). Previous studies investigating the interpreter advantage often involved recruiting professional interpreters or interpreting students who had undergone rigorous screening processes or had received various memory or attention training (e.g., Köpke & Nespoulous, Reference Köpke and Nespoulous2006). Although some longitudinal studies have documented the training effect of interpreting (Babcock et al., Reference Babcock, Capizzi, Arbula and Vallesi2017; Dong & Liu, Reference Dong and Liu2016), it is possible that individuals with inherently higher cognitive abilities may benefit more from such training. The contribution of genetically embedded cognitive advantages (e.g., Nour et al., Reference Nour, Struys, Woumans, Hollebeke and Stengers2020) should not be underestimated in contexts requiring extensive bilingual proficiency, such as interpreting.
At the same time, another critical question to consider is: when does the interpreter advantage begin to develop (García, Reference García2014)? In a longitudinal study that compared working memory changes in students enrolled in a consecutive interpreting training course versus a general English course, Dong et al. (Reference Dong, Liu and Cai2018) provided the evidence that a relatively brief training period (32 hours of class time and about 40 hours of after-class practice across a 16-week semester) was sufficient to induce interpreter advantage. The inhibitory control advantage was also observed in student interpreters with less than one year of training (Woumans et al., Reference Woumans, Ceuleers, Van der Linden, Szmalec and Duyck2015). Counterintuitively, Köpke and Nespoulous (Reference Köpke and Nespoulous2006) demonstrated that novice interpreters outperformed professionals in working memory; the authors posited that novices, grappling with the acquisition of new skills, might invest greater cognitive effort than those who have internalised the task schema of interpreting. This phenomenon can be elucidated by the supply-demand cognitive plasticity framework: when cognitive supply falls short of the demand, as is the case for novices tackling a cognitively demanding task, there is a significant increase in cognitive supply (Lindenberger, Reference Lindenberger2014). In other words, before achieving automatic processing, novices typically engage in effortful processing during skill acquisition that alters pre-existing cognitive processing routines and fosters new representations to stay focused on task goals and minimise errors (Schneider & Chein, Reference Schneider and Chein2003). It is also suggested by Dong and Liu (Reference Dong and Liu2016) that cognitive changes during multitasking training might resemble a developmental curve, initially increasing but eventually levelling off or even declining upon reaching a cognitive peak. Therefore, investigating cognitive changes at various stages of interpreting training, especially at the beginner level, may elucidate the overarching patterns observed in previous studies.
Previous studies that employed a longitudinal approach were conducted in training programmes lasting at least one semester, with several hours of weekly training time (Babcock et al., Reference Babcock, Capizzi, Arbula and Vallesi2017; Dong et al., Reference Dong, Liu and Cai2018; Dong & Liu, Reference Dong and Liu2016). Such research designs may be susceptible to potential environmental confounders over time and can be limited in identifying dynamic EF changes at the onset of training. In interpreting training, short intensive conference-mocking sessions are a commonly adopted method to efficiently discuss interpreting skills and engage trainees in practical exercises. For example, Bartlomiejczyk (Reference Bartlomiejczyk, Gile, Hansen and Pokorn2010) demonstrated a one-week practice programme simulating real-world conference interpreting conditions, in which ten interpreting students showed small improvements in interpreting skills but not delivery. To our knowledge, Bartlomiejczyk (Reference Bartlomiejczyk, Gile, Hansen and Pokorn2010) is the only study that examined the effect of short-term training, which, however, focused only on the quality of interpreting performance and suffered from a lack of control group. The length of training is rarely considered a factor affecting the interpreter advantage. In fact, it has been demonstrated that there is a transfer effect of short-term training reflected as improved task-relevant cognitive abilities. For example, Buschkuehl et al. (Reference Buschkuehl, Hernandez-Garcia, Jaeggi, Bernard and Jonides2014) showed that 7 days of working memory training led to neural adaptations associated with working memory in untrained cross-modal tasks. Wu et al. (Reference Wu, Chen, Thierry, Fu, Wu and Guo2021) examined an 8-day inhibitory control training engaging Chinese-English bilinguals and observed a transfer effect between domain-general inhibitory control and bilingual language-switching control. Moreover, evidence of short-term training effects on cognitive control and attention has been documented in real-world task trainings, such as music training (20 days; Moreno et al., Reference Moreno, Bialystok, Barac, Schellenberg, Cepeda and Chau2011) and meditation practice (5 days; Tang et al., Reference Tang, Ma, Wang, Fan, Feng, Lu, Yu, Sui, Rothbart, Fan and Posner2007). It has also been found that one week of immersion in a second language learning environment led to changes in EF (Bak et al., Reference Bak, Long, Vega-Mendoza and Sorace2016). The evidence altogether consistently confirmed the presence of a short-term training effect on cognitive abilities. However, little is known about whether short-term training is sufficient to shape novices into cognitively competent performers of interpreting or whether the interpreter advantage can emerge within a short time.
The present study examines the effect of a two-week intensive training programme on EF and interpreting performance. We aim to evaluate (1) cognitive changes induced by the short, intensive training and (2) the extent to which students’ progress is influenced by their pre-existing cognitive profiles. One hypothesis of interest is that cognitive changes can be initiated at the onset of training, when cognitive demand significantly exceeds supply. This will be verified by examining differences in EF before and after this training. A second hypothesis is that pre-existing cognitive abilities support changes in EF and interpreting performance. In other words, individuals with higher cognitive abilities may acquire SI skills more efficiently and enhance their EF to a greater extent, following the ‘Rich-Get-Richer Hypothesis’ (Hambrick & Engle, Reference Hambrick, Engle, Davidson and Sternberg2003).
2. Methods
To investigate how EF would influence student performance at the critical stage of training onset, the present study focuses on a two-week intensive interpreting training programme, combining paced sight translation (STR) and SI sessions. Participants were tested at a pre-test and a post-test. There are three parts in both pre- and post-tests: (1) paced STR performance, (2) SI performance and (3) EF tasks. The study was approved by the Ethics Committee of Durham University (research project ID: MLAC-2022-07-05T12_17_30-xmpw64), concerning empirical studies with human participants.
2.1. Participants
A total of 53 students (2 male, 1 preferring not to disclose gender, mean age = 23.80, SD = 1.88) consented to take part in the research. All participants were registered in a language-related postgraduate programme in Durham University at the time of the study. There were 26 participants enrolled in the two-week intensive interpreting training programme (i.e., the experimental group) and 27 taking regular university courses during the experiment (i.e., the control group). Five participants in the control group were excluded from data analysis due to data quality considerations. Among the remaining participants (experimental group: n = 26; control group: n = 22), their L1 was Mandarin Chinese and L2 was English. The language of instruction for this interpreting training programme was Mandarin Chinese. A language and interpreting history questionnaire was administered to the participants before the experiment. L2 proficiency was indicated by both an objective measure (i.e., IELTS score) and a subjective measure, self-rated proficiency. The participants’ prior engagement with SI and STR was brief (SI: mean = 5.90 hours, SD = 4.18 hours; STR: mean = 10.11 hours, SD = 7.02 hours), considerably less than the one-semester training reported in Dong et al. (Reference Dong, Liu and Cai2018). The participants had basic knowledge of interpreting techniques but were not yet qualified for undertaking any interpreting tasks. They were thus considered to be at the beginner stage of interpreting training. The two groups were matched in terms of language and interpreting experiences, as was confirmed by statistical comparisons (see a summary of their language profile in Table 1).
Table 1. Background characteristics of participants (group means and SDs in brackets) and between-group statistical comparison results using the Wilcoxon rank sum test

2.2. The training programme
The programme was designed to develop bilingual skills and knowledge essential to SI, serving as a pathway to professional conference interpreting. It was thus intended for students with adequate bilingual proficiency and little training background in SI. During the first week (5 days) of the programme, students were exposed to 2 hours of lectures and 4 hours of paced STR practice daily. The following week featured a combination of SI practice (4 hours a day) and paced STR practice (2 hours a day). In total, there were 20 hours of SI training and 30 hours of STR training within the two-week timeframe. Since SI involves multitasking and is temporally stressful, novices may find it challenging to keep up with the task at the beginning of training. STR, entailing oral reformulation of written texts from one language to another, has long been acknowledged as a pedagogical exercise for novice interpreters (Agrifoglio, Reference Agrifoglio2004). While STR is expected to foster instant response and oral fluency in trainees, we introduce paced STR in this programme, which simulates SI in which the input is temporally unfolding, to quickly prepare students for getting started in performing SI. The only difference between paced STR and SI is the input modality: source language comprehension during STR was in the visual form, while in SI, it was auditory. In this sense, paced STR could serve as a transitional pedagogy for SI acquisition as it shares several aspects with SI. Paced STR functions at three levels: (1) to enhance the instant activation of translation equivalents in the predetermined translation direction, (2) to familiarise students with the use of chunking strategies and (3) to help them cope with time pressure. By engaging in paced STR practice before actual SI practice, students were able to adapt to certain task features of SI in advance.
2.3. Materials and procedure
Prior to the training, the participants were asked to complete a STR task and a SI task individually. Given their novice understanding of interpreting, the experimenter briefly explained and demonstrated both tasks. The participants were informed that these tasks were preliminary diagnostic tests designed to assess their untrained, inherent bilingual qualities, intended solely for research purposes. They were also required to complete three computerised cognitive tasks that measured inhibitory control, cognitive flexibility and working memory, respectively. They were instructed to respond as quickly and accurately as possible. The cognitive tasks were programmed and administered using PsychoPy (version 2021.1.2). Following the training, a post-test incorporating the aforementioned tasks was conducted one day after completion of the training to minimise potential fatigue effects. The timing of tests for both groups was aligned with the training schedule. All of the five tasks were conducted in a laboratory environment. The materials and assessment methods for SI and STR performance and the cognitive tasks are described as follows.
2.4. Executive function measurements
2.4.1. Flanker task
The Flanker task, originally introduced by Eriksen and Eriksen (Reference Eriksen and Eriksen1974), was employed to measure inhibitory control. The task used in the present study was adapted from Ellefson et al. (Reference Ellefson, Ng, Wang and Hughes2017), which required participants to observe a row of five fish of the same shape and size and identify the direction in which the fish in the central position (i.e., the target) was swimming (left or right). The right arrow key was to be pressed correspondingly if the target fish swam rightward, and vice versa. On congruent trials, all five fish swam in the same direction. On simple incongruent trials, the target fish swam in the opposite direction to the nontarget fish. On complex congruent trials, the four nontarget fish swam in different directions, irrespective of the swimming direction of the target fish. There were a total of 108 trials evenly divided into the three trial types and presented in a random order. No time restriction was set for participants to make a response. After they responded by pressing the corresponding key, a fixation cross appeared at the centre of the screen for 750 ms before the next trial.
2.4.2. Figure matching task
The Figure Matching task was adapted from Ellefson et al. (Reference Ellefson, Ng, Wang and Hughes2017) to assess cognitive flexibility. In a typical trial of the task, a target stimulus was presented at the centre of the screen, with its characteristics varying in terms of shape (triangle or circle), colour (blue or red), or both. Two small figures were positioned in the bottom corners of the screen, with one matching the shape of the target and the other matching its colour. A cue appeared at the top of the screen, indicating which dimension (colour or shape) participants should follow to make a judgement in the current trial. Participants were required to identify the figure that corresponded to either the shape or colour of the target stimulus by pressing the corresponding key. The trials were presented in a random order across four blocks of 32 trials (for a total of 128 trials), with two single blocks featuring either colour or shape trials and two mixed blocks containing both colour and shape trials. In the mixed blocks, trials were presented in a predetermined dual sequence, with trial type alternating every two trials (i.e., colour, colour, shape, shape, colour, colour, shape, shape, etc.). The mixed blocks were counterbalanced between participants to begin with either a colour trial or a shape trial. Consequently, the task elicited two types of responses: repeating and switching. The repeating response was recorded in both single and mixed blocks, while the switching response only occurred in mixed blocks, as only some trials in mixed blocks featured a change in task instruction from the previous trial. No time restriction was set for this task.
2.4.3. Corsi block task
The Corsi Block task is a revised version of the classic test of visual-spatial working memory (Corsi, Reference Corsi1972; Ellefson et al., Reference Ellefson, Ng, Wang and Hughes2017). In this task, there were nine square frames randomly arranged on a grey background. On a typical trial, a sequence of blue circles flashed on the screen, with each circle filling one square frame. Each circle flashed on the screen for 500 milliseconds, followed by the subsequent one. Immediately after the last circle flashed, a cue appeared in the top-right corner of the screen, prompting a response from the participant. Participants were required to click on the square frames in the order that the frames were filled with flashing circles. Both forward and backward conditions were included in the task, with participants instructed to reproduce the sequence in either the forward or backward order, respectively. The task contained 18 forward trials and 18 backward trials. In each condition, the number of items increased by one item every two trials in a fixed ascending order, starting with a sequence of two items and culminating in a sequence of nine items. Two practice trials, each featuring a sequence of two items, preceded the formal trials. Once participants completed the task by pressing the corresponding key, they were not allowed to redo or revise their choices. Only data collected from the backward condition were calculated as indexing working memory.
2.5. STR and SI tasks
The text materials used for the STR task consisted of three excerpts from a transcribed keynote speechFootnote 1, which provided a generic report on the British economy, featuring only a small number of specialised terminologies. The excerpts were edited to retain intact information units and to maintain comparable length. To ensure that the linguistic complexity was comparable across the three texts, a variety of lexical and syntactic indices were computed using TAASSC (Kyle, Reference Kyle2016), see details in Appendix 1 of the Supplementary Material. Before the STR task, participants were given five minutes to review the text to be sight translated, along with a bilingual glossary to help with the understanding of terminologies. They were explicitly instructed not to use dictionaries or any online consultation tools during the experiment. The STR task had no time limit, but participants were instructed to aim for both accuracy and speed in their translation. Their performance was recorded using a microphone. Due to technical reasons (the working condition of the microphone), the participants whose audio recordings for STR were missing for more than 50% were excluded from the analyses, resulting in an exclusion of 5 participants in the control group, as previously mentioned.
Materials used for the SI task consisted of two video clips extracted from another keynote speech on the topic of economics, delivered by a British male speakerFootnote 2. The textual and auditory features of the video clips are outlined in Appendix 2 of the Supplementary Material. Considering the fact that the participants were new to SI, each of the two video clips was divided into three segments, each lasting for one and a half minutes (Dong et al., Reference Dong, Li and Zhao2019). Participants were instructed to interpret one segment and then take a short break before interpreting the next segment. Audio was transmitted through a pair of head-mounted earphones, and interpreting output was recorded via a microphone. The visual information about the speaker was also displayed on a full screen while interpreting. The materials for both STR and SI tasks were systematically alternated between the pre- and post-tests to ensure balanced exposure. For SI, half of the participants performed on video clip A in the pre-test and on video clip B in the post-test, while the other half took the experiment in the reverse order. For STR, text assignments followed a Latin Square design to ensure an even distribution of materials across tests. The same design was applied separately to each group.
2.6. Data analysis
The participants’ performances in the cognitive tasks, SI and STR tasks were analysed. For cognitive tasks, accuracy and response time were converted into a score of efficiency as an index of task performance so that the speed-accuracy trade-off could be accounted for (as recommended in Ellefson et al., Reference Ellefson, Ng, Wang and Hughes2017). The efficiency score was calculated by dividing accuracy (number of correct responses divided by total trials) by response time (in seconds) in correct trials. SI and STR performance was judged in terms of output quality, which was assessed by using a rubrics-based rating scale consisting of three 8-point scales: information completeness (InfoCom), fluency of delivery (FluDel), and target language quality (TLQual) (Han, Reference Han2015, see Appendix 3 of the Supplementary Material). InfoCom was weighted 50%, and FluDel and TLQual were weighted 25%, respectively, in calculating the total score (Lee, Reference Lee2015). Two professional interpreters were invited to rate the STR performance, and another two with similar experience to rate the SI performance. The raters received a brief training regarding the rating scale and procedure, followed by discussions within each pair to ensure a unified understanding of the scale descriptors. The intraclass correlation coefficients (ICCs) were calculated to determine the interrater reliability for the SI and STR tasks, yielding acceptable scores: STR pre-test, r = .85; STR post-test, r = .86; SI pre-test, r = .74, SI post-test, r = .86.
All data were processed and wrangled using the tidyverse package (Wickham et al., Reference Wickham, Averick, Bryan, Chang, McGowan, François, Grolemund, Hayes, Henry, Hester, Kuhn, Pedersen, Miller, Bache, Müller, Ooms, Robinson, Seidel, Spinu and Yutani2019) in R Studio Version 1.4.1717 (R Core Team, 2021). In the main analysis, we examined the training effect by conducting 2 (group: experimental vs control) × 2 (measure time: pre-test vs. post-test) analyses of variance (ANOVAs), with group as between-subject factor and measure time as within-subject factor, via aov() in the stats package (R Core Team, 2021). Since the efficiency scores in the cognitive tasks and the overall scores and sub-scores of the STR and SI performances were mostly non-normally distributed, we rank transformed the dependent variables (hence non-parametric ANOVA, Conover & Iman, Reference Conover and Iman1981). Post hoc analyses to examine significant main and interaction effects were conducted using Tukey’s HSD test in the emmeans package (Lenth, Reference Lenth2022). To examine how initial cognitive abilities supported improvements, we used lm() in the stats package to run simple regression and multiple regression for the experimental group only.
3. Results
3.1. Effect of training on EF
Descriptive statistics of the EF performance for the pre- and post-tests across the experimental and control groups are detailed in Appendix 4 of the Supplementary Material. Wilcoxon rank sum tests showed no significant differences in the Flanker task (W = 298, p = .8) between groups at pre-tests. However, significant differences were observed in the Figure Matching task (W = 423, p < .001) and the Corsi Block task (W = 474, p < .001), with the control group performing better initially.
A two-way ANOVA (group × measure time) was then performed for the efficiency scores to test the effect of training on EF. Table 2 summarises the results of ANOVAs conducted to compare performances in the cognitive tasks. A main effect of measure time was found for all three tasks. Post hoc analysis revealed that efficiency scores in the post-test were higher than those in the pre-test in the Flanker task (p = .040), the Corsi Block task (p = .001), and the Figure Matching task (p = .001) respectively. A main effect of group was found in the efficiency scores of the Figure Matching task, and post hoc analysis showed the control group performed with significantly higher efficiency than the experimental group (p = .045). A significant interaction effect was also found between measure time and group in the Figure Matching task, which was due to significantly higher efficiency score in the post-test than in the pre-test for the experimental group (p < .001).
Table 2. Two-way ANOVAs comparing measure time and group differences of EF task performance

Note: The asterisk indicates significant differences at α = 0.05; * = p < .05, ** = p < .01, *** = p < .001. Higher rank represents better performance. Acronyms: (1) Pre: pre-test, (2) Post: post-test, (3) Exp: experimental group, (4) Contr: control group, (5) ExpPre: pre-test of experiment group and (6) ExpPost: post-test of experiment group.
The training effect found for the Figure Matching task was not yet confirmed, given the unmatched performances of the experimental and control groups. Therefore, we conducted a one-way analysis of covariance (ANCOVA) to assess the effect of training on post-test efficiency scores of the Figure Matching task, controlling for pre-test scores. The ANCOVA revealed that the pre-test scores were a significant covariate, F(1, 45) = 77.63, p < .001, indicating that pre-test task performance was strongly associated with post-test scores. After controlling for pre-test scores, there was a significant effect of group on post-test scores, F(1, 45) = 5.87, p = .019, suggesting that the training had a significant effect on participants’ post-test performance.
3.2. Effect of training on interpreting performance
Two-way ANOVAs (group × measure time) were also performed for both overall scores and sub-scores of STR and SI to test whether the experimental group improved more than the control group in terms of interpreting performance. Table 3 shows the result of analyses for the STR performance. A significant difference in measure time was found for FluDel. Post hoc analysis revealed that participants’ performance in the post-test was higher than that in the pre-test for FluDel (p = .017) in STR. There was also a significant interaction effect of measure time and group. Post hoc analysis showed that the effect was due to significantly higher scores in FluDel that the experimental group gained in the post-test than that the in pre-test (p = .002). However, no main effect or interaction effect was found for overall performance quality, TLQual, or InfoCom.
Table 3. Two-way ANOVAs comparing measure time and group differences of STR performance

Note: The asterisk indicates significant differences at α = .05; * = p < .05. Higher rank represents better performance. Acronyms: (1) InfoCom: Information Completeness, (2) FluDel: Fluency of Delivery, (3) TLQual: Target Language Quality, (4) Pre: pre-test, (5) Post: post-test, (6) Exp: experimental group, (7) Contr: control group, (8) ExpPre: pre-test of experiment group and (9) ExpPost: post-test of experiment group.
Table 4 shows the result of analyses for the SI performance. A main effect of measure time was found for both overall scores and sub-scores. Post hoc analysis revealed that participants’ performance in the post-test was higher than that in the pre-test in terms of all the scores (all ps < .001). Interaction effects were also observed for both overall scores and sub-scores. Post hoc analysis showed that the experimental group significantly outperformed the control group in the post-test than in the pre-test in all scores (all ps < .001).
Table 4. Two-way ANOVAs comparing measure time and group differences of SI performance

Note: The asterisk indicates significant differences at α = .05; * = p < .05, ** = p < .01, *** = p < .001. Higher rank represents better performance. Acronyms: (1) InfoCom: Information Completeness, (2) FluDel: Fluency of Delivery, (3) TLQual: Target Language Quality, (4) Pre: pre-test, (5) Post: post-test, (6) Exp: experimental group, (7) Contr: control group, (8) ExpPre: pre-test of experiment group and (9) ExpPost: post-test of experiment group.
3.3. Relationship between initial EF and EF changes
Simple regression analysis was conducted for the experimental group only to examine the relationship between participants’ pre-existing EF and the improvements made in EF as a result of training. The results showed that pre-existing performance in the three EF tasks negatively predicted their improvements after training, that is, individuals with stronger initial EF exhibited smaller improvements when performing at the post-test compared to the pre-test. Specifically, pre-existing inhibition was a significant predictor of improvement on inhibition (β = −.78, R² = .60, F(1, 24) = 37.82, p < .001). Pre-existing cognitive flexibility was a significant predictor of improvement on cognitive flexibility (β = −.51, R² = .23, F(1, 24) = 8.51, p = .008), and pre-existing working memory was a significant predictor of improvement on working memory (β = −.53, R² = .25, F(1, 24) = 9.48, p = .005).
3.4. Relationship between initial EF and changes in interpreting performance
Multiple regression analysis was conducted to explore the contribution of initial EF to the improvements in the STR and SI performances (see Table 5). None of the EF subcomponents predicted improvements in any aspect of STR performance. However, cognitive flexibility significantly and positively predicted the performance improvements in SI, explaining 20% of the variance in the overall improvement of SI performance. It also significantly and positively predicted the improvements observed in FluDel. However, EF did not significantly explain the variance of the improvements in the FluDel component of SI performance.
Table 5. Multiple regression testing the contribution of executive functions on STR and SI

Note: The asterisk indicates significant differences at α = .05; * = p < .05. Acronyms: (1) InfoCom: Information Completeness, (2) FluDel: Fluency of Delivery, (3) TLQual: Target Language Quality.
4. Discussion
The study aimed to investigate whether short-term interpreting training could lead to the acquisition of an interpreter advantage and whether pre-existing EF modulated the training gains. All three EF subcomponents – inhibitory control, cognitive flexibility and working memory – showed significant enhancements over the two weeks in both experimental and control groups, which likely reflects a learning effect brought by performing the same cognitive tasks twice within a short time interval irrespective of whether participants attended the training programme or not. Only cognitive flexibility was enhanced to a significantly greater extent in the experimental group compared to the control group, suggesting an effect of training. In terms of task performance, there was a significant overall SI performance improvement in the experimental group, whereas the improvement of STR performance in this group was confined to delivery of fluency. For those who attended the training programme, pre-existing cognitive flexibility in individuals predicted their progress observed in the SI performance, including both overall performance and fluency of delivery. Additionally, regression analyses revealed negative relationships between all three subcomponents in their pre-existing state and the changes they underwent during the training period, respectively, suggesting that the cognitively less advantageous individuals benefitted more from the training.
4.1. Presence of interpreter advantage
Previous longitudinal studies have established the pivotal role of EF in interpreting, with cognitive flexibility being found to be the most critical component (see Nour et al., Reference Nour, Struys, Woumans, Hollebeke and Stengers2020, for a systematic review). Our study supports previous findings related to cognitive flexibility: the interaction effect we observed indicated significant improvements in the experimental as compared to the control group. The training effect on cognitive flexibility is consistent with the results reported by Dong and Liu (Reference Dong and Liu2016), who observed an improvement in switch cost during a half-year-long interpreting course, and it is also in line with the findings of Macnamara and Conway (Reference Macnamara and Conway2016), who detected training effects using the WCST in the context of sign-language SI over a two-year period. Although these studies differ in specific types of interpreting tasks, converging evidence suggests that interpreting, which entails constant language shifting, heavily relies on the domain-general EF of switching.
As for working memory, our study aligns with the findings of Babcock et al. (Reference Babcock, Capizzi, Arbula and Vallesi2017), who observed no significant training effect on working memory capacity using the operational span task and the symmetry span task. However, this outcome differs from that of Macnamara and Conway (Reference Macnamara and Conway2016), who identified an interpreter advantage in working memory that was enhanced by training. It is important to note that the dimension of working memory measured by the two studies varied, with our study focusing on visual-spatial working memory and Macnamara and Conway’s (Reference Macnamara and Conway2016) on digital and verbal working memory. Given that interpreting is fundamentally a language task, working memory is predominately utilised at the verbal level. A meta-analysis by Wen and Dong (Reference Wen and Dong2019) showed that the interpreter advantage was less pronounced in visual-spatial span tasks than in verbal and digit span tasks. Since Macnamara and Conway’s (Reference Macnamara and Conway2016) study involved the conversion between verbal language and sign language, it is likely that the coordination and distribution of verbal and non-verbal actions were both fundamentally underpinned by domain-general EF. Thus, the generalisability of verbally oriented cognitive changes in their study would be more substantial. This may explain the absence of improvements in visual-spatial working memory capacity after training in our study as compared to theirs. It should be noted that we did not use a verbal measurement of working memory because the aim of this study is to examine domain-general EF. This requires controlling for potential confounds that could be introduced by the verbal skills if a verbal working memory test is used. As far as our results show, short-term training of interpreting could not enhance working memory at the domain-general level, but the question regarding verbal working memory remains uncertain and should be addressed in future studies.
In terms of inhibition, there has been no prior evidence supporting the hypothesis that interpreter training enhances inhibition (Dong & Liu, Reference Dong and Liu2016; Dong & Xie, Reference Dong and Xie2014), as is consistent with our findings. There seems to be converging evidence suggesting that inhibition is highly insensitive to interpreting training, particularly when assessed through behavioural measurements.
Overall, our results are consistent with the findings of Nour et al. (Reference Nour, Struys, Woumans, Hollebeke and Stengers2020). The fact that our study yielded results comparable to those of other longitudinal studies (e.g., Dong & Liu, Reference Dong and Liu2016; Macnamara & Conway, Reference Macnamara and Conway2016), which lasted at least a semester, suggests that short, intensive training on interpreting can also induce noticeable cognitive changes. The significant improvements in cognitive flexibility performance might also be attributed to the participants having limited prior interpreting experience. In another study examining the interpreter advantage in coordination (Zhong & Dong, Reference Zhong and Dong2023), no significant differences were found between interpreting students at the beginning of their training and control bilinguals. Both Zhong and Dong (Reference Zhong and Dong2023) and our study recruited beginner-level interpreting students, but the participants in their study had received several hours of weekly training for eight weeks before they were tested. The participants in Zhong and Dong (Reference Zhong and Dong2023) were thus more experienced at interpreting than those in our study. It appears that cognitive changes are most observable at the specific time when one starts from scratch in acquiring the interpreting skill. Our results clearly show that both the intensity of interpreting training and the training stages in which participants were tested bear significance to interpreting-induced cognitive changes. However, it is premature to draw conclusions regarding the full picture of such cognitive changes since we did not evaluate cognitive performance retention after the training period had concluded.
4.2. Pre-existing executive function and training gains
Our findings show not only general improvements in SI performance but also improvements in all three aspects of interpreting quality. In STR, however, the training-modulated effect was only evident in fluency of delivery. One possible explanation is that STR served as a preparatory exercise for SI in the training. As a result, emphasis was placed on delivery rate over accuracy in the speed-accuracy tradeoff to develop cognitive strategies for the concurrent language production required in SI. In addition, STR allowed for greater flexibility in the coordination between comprehension and production compared to SI. In STR, the source information was visually presented in its entirety, which could be readily revisited and reprocessed. This implies that the participants did not need to retain information in memory, which is typically necessary in SI, thus potentially reducing cognitive load and enhancing performance fluency. Despite this, the improvements in SI performance could still be attributed to paced STR in function as a buffer. Alternatively, these improvements might also be attributable to the combined effect of training intensity and the exploitation of potentials for interpreting development in novices.
In terms of whether participants’ training gains were predicated on their pre-existing cognitive profile, only cognitive flexibility showed a predictive relationship with enhanced overall SI performance and improved fluency of delivery in SI. This suggests that individuals proficient in shifting between different tasks were more adept at managing the constant switching between comprehending one language and producing another. This proficiency enabled them to preserve cognitive resources that would otherwise be consumed by the task, thereby enhancing performance. Notably, this effect was absent in paced STR, indicating that strong switching ability is particularly relevant to the simultaneous execution of language tasks, rather than mere cross-language conversion, which is characteristic of STR. However, this result is contrary to that of Macnamara and Conway (Reference Macnamara and Conway2016), who did not identify pre-existing cognitive abilities as significant predictors of performance improvement. The discrepancy may be attributed to differences in the duration of training and the particular nature of the tasks. Unlike the two-year longitudinal training in their study, where performance improvements could be gradual and participants’ cognitive abilities might fluctuate based on recent experiences, our study entailed intensive short-term training, potentially better positioned to identify subtle task performance changes linked to initial EF. Also, the absence of a relationship between pre-existing cognitive profiles and training gains in sign-language SI (Macnamara & Conway, Reference Macnamara and Conway2016) may suggest that concurrent verbal processing demands a greater reliance on cognitive flexibility than does cross-modality (verbal and motor) conversion.
Our results also point to the short-term development of EF shaped by interpreting training. It is noteworthy that all three EF subcomponents, in their pre-existing form, showed negative correlations with EF gains. This suggests that individuals who initially displayed strong EF performance were less likely to benefit from additional cognitive gains. This finding has several implications concerning the interaction between cognitive profiles and the SI task. First, it suggests that the development of EF is not purely cumulative and may exhibit a ceiling effect. Second, individuals who begin with superior EF may have less room for cognitive growth. One possible explanation is that individuals with less developed EF might face higher cognitive demands, requiring them to utilise more cognitive resources and consequently showing larger gains in EF. In contrast, those who had sufficient cognitive resources at the beginning may not experience a substantial increase in EF to complete the task. According to the expert performance account (Feltovich et al., Reference Feltovich, Prietula, Ericsson, Ericsson, Charness, Feltovich and Hoffman2006), novices rely on domain-general cognition to resolve problems, whereas individuals with more experience and practice tend to process and synthesise information in their field of expertise at a deeper level, often without increased cognitive consumption. In the case of SI, novices may initially be overwhelmed by the high cognitive demands of the task. However, with increased exposure, they tend to rely less on generic cognitive abilities, resulting in less pronounced enhancement in EF. Thus, more cognitively capable individuals are better positioned to shift to expertise-related use of cognitive abilities from generic cognitive abilities.
5. Conclusion
As an extreme bilingual task, SI is often associated with either good or enhanced domain-general cognitive control abilities. The present study demonstrates the discursive relationship between EF and interpreting task performance in a two-week intensive training programme. We found that cognitive flexibility, as a subcomponent of EF, plays a critical role in facilitating the effective acquisition of interpreting skills, offering a distinct advantage in enhancing fluency. There is indeed an inherent advantage in those who are proficient at switching when becoming an interpreter. At the same time, the growth of cognitive flexibility is further a result of training. Moreover, beginner interpreters with lower levels of cognitive efficiency could derive more cognitive benefits from such training. The relationship between EF and the interpreting expertise cannot be simply explained as ‘rich getting richer’. Instead, the development of EF is triggered by currently unattainable cognitive demands, and individuals vary in terms of the size of the cognitive gaps to fill. Since the training only lasted for a short time, it is challenging for us to determine whether EF continues to evolve as participants become more skilled at the task. Liu et al. (Reference Liu, Schallert and Carroll2004) suggested that skilled interpreters relied more on experience-based judgement and selection than on the extensive exploitation of working memory. Therefore, we hypothesise that the focus of acquisition would shift from cognitive overload management to interpreting-specific skills such as linguistic techniques or delivery as they practise more. The emergence of expertise would be reflected in a reduced dependency on domain-general cognitive resources and an increased employment of domain-specific skills.
This study provides implications for the development of EF and expertise performance in general. We tend to view EF as a set of generic cognitive components that underpin certain successful or failed cases of complex task completion. However, empirical evidence is sometimes contradictory; examples include the replication crisis of the bilingual advantage (Bak, Reference Bak2016) and the controversial evidence for the interpreter advantage in working memory (Nour et al., Reference Nour, Struys, Woumans, Hollebeke and Stengers2020). It appears that an alternative way to interpret the results could be by conceptualising EF as specific to the kind of skills or task goals under discussion, as these skills are acquired, or as these task goals are activated, through specific sets of values, expectations, knowledge and beliefs (Doebel, Reference Doebel2020). It is therefore important to consider the development of EF in relation to the specific scenarios and stages of expertise and skill development. Constrained by its classroom-based, quasi-experimental nature, the present study did not include a delayed post-test to track retained development. Future studies that explore the interpreter advantage in the context of intermediate- or advanced-level interpreting training will provide a comprehensive picture of the developmental dynamics of EF as it is shaped by interpreting experience.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S136672892500032X.
Data availability statement
The data that support the findings of this study are openly available in the Durham Research Online DATAsets Archive at http://doi.org/10.15128/r1jq085k03z.
Acknowledgements
We would like to thank Don Starr, Dr Huolingxiao Kuang and Renwen Xu for their assistance with data collection. We also thank the anonymous reviewers for their constructive comments on earlier versions of this paper and all the students who participated in the study. Special thanks go to Dr Kevin Lin for his invaluable contribution to the training programme.
Funding statement
This work was supported by the grant from the National Social Science Foundation of China (Grant No. 20BYY014) to B.Z.
Competing interest
The authors declare none.