Introduction
The present paper brings together research from three areas in early childhood education: developmental language disorder (DLD), narrative development, and cross-linguistic transfer. DLD is diagnosed when a child shows below-age performance on a range of linguistic tasks while having intact hearing, no intellectual delay, no sensory impairment, or other neurological conditions (Leonard, Reference Leonard2017). Bilingual children need to meet these exclusionary and inclusionary criteria in both their languages to be diagnosed with bilingual DLD (BiDLD) (Abutbul-Oz & Armon-Lotem, Reference Abutbul-Oz and Armon-Lotem2022; Armon-Lotem et al., Reference Armon-Lotem, Restrepo, Lipner, Ahituv-Shlomo and Altman2021). A coherent narrative involves macrostructure (including reference to characters, a problem, feelings, goal, attempt, outcome, and internal response) and microstructure elements (vocabulary and syntax), and the development of both components is crucial for successful storytelling (Gillam et al., Reference Gillam, Olszewski, Squires, Wolfe, Slocum and Gillam2018; Pico et al., Reference Pico, Hessling Prahl, Biel, Peterson, Biel, Woods and Contesse2021). DLD poses challenges to the acquisition of both macrostructure and microstructure narrative skills among monolingual and bilingual children (e.g., Boerma et al., Reference Boerma, Leseman, Timmermeister, Wijnen and Blom2016; Cleave et al., Reference Cleave, Girolametto, Chen and Johnson2010; Thordardottir et al., Reference Thordardottir, Cloutier, Ménard, Pelland-Blais and Rvachew2015; Méndez et al., Reference Méndez, Crais, Castro and Kainz2015). Investigating the challenges and implementing narrative intervention are of special importance for children with BiDLD, whose narrative development is weak in the two languages, and may benefit from cross-linguistic transfer when instruction in one language enhances the other language’s narrative performance (Boerma et al., Reference Boerma, Leseman, Timmermeister, Wijnen and Blom2016; Squires et al., Reference Squires, Lugo-Neris, Peña, Bedore, Bohman and Gillam2014). The literature review that follows summarizes findings on narrative skills in children with BiDLD and reviews narrative intervention studies in bilinguals.
Narrative skills in bilingual children with DLD
Children with DLD demonstrate weak expressive language skills, which are crucial for successful narrative performance (Bishop, Reference Bishop2017; Leonard, Reference Leonard2017). Most studies on narrative skills among monolingual children have shown weaker performance of children with DLD than their peers with typical language development (TLD) for both macrostructure and microstructure (Duinmeijer et al., Reference Duinmeijer, de Jong and Scheper2012; Fey et al., Reference Fey, Catts, Proctor-Williams, Tomblin and Zhang2004; Hayward & Schneider, Reference Hayward and Schneider2000; Reilly et al., Reference Reilly, Losh, Bellugi and Wulfeck2004; Scott & Windsor, Reference Scott and Windsor2000). Studies with bilingual children have shown inconsistent results. For macrostructure, some studies have found that children with BiDLD produced fewer Story Grammar (SG) elements than peers with bilingual TLD (BiTLD) (Boerma et al., Reference Boerma, Leseman, Timmermeister, Wijnen and Blom2016; Fichman et al., Reference Fichman, Altman, Voloskovich, Armon-Lotem and Walters2017; Gagarina et al., Reference Gagarina, Klassert and Topaj2019; Hao et al., Reference Hao, Sheng, Zhang, Jiang, De Villiers, Lee and Liu2018; Govindarajan & Paradis, Reference Govindarajan and Paradis2019; Paradis et al., Reference Paradis, Schneider and Sorenson Duncan2013; Rezzonico et al., Reference Rezzonico, Chen, Cleave, Greenberg, Hipfner-Boucher, Johnson, Milburn, Pelletier, Weitzman and Girolametto2015; Squires et al., Reference Squires, Lugo-Neris, Peña, Bedore, Bohman and Gillam2014) and used fewer causal relations in their narratives (Fichman et al., Reference Fichman, Altman, Voloskovich, Armon-Lotem and Walters2017; Kupersmitt & Armon-Lotem, Reference Kupersmitt and Armon-Lotem2019). Other studies showed similarities between the groups (Altman et al., Reference Altman, Armon-Lotem, Fichman and Walters2016; Iluz-Cohen & Walters, Reference Iluz-Cohen and Walters2012; Tsimpli et al., Reference Tsimpli, Peristeri and Andreou2016). For the results on microstructure, there is a greater consensus in research findings, where children with BiDLD exhibit poorer performance on morphosyntactic features, sentence complexity measures, and narrative length in comparison to children with BiTLD in both languages (Iluz-Cohen, & Walters, Reference Iluz-Cohen and Walters2012; Paradis et al., Reference Paradis, Schneider and Sorenson Duncan2013; Rezzonico et al., Reference Rezzonico, Chen, Cleave, Greenberg, Hipfner-Boucher, Johnson, Milburn, Pelletier, Weitzman and Girolametto2015; Squires et al., Reference Squires, Lugo-Neris, Peña, Bedore, Bohman and Gillam2014; Tsimpli et al., Reference Tsimpli, Peristeri and Andreou2016).
One possible reason for the lack of consensus in these findings can be the high variability of factors affecting narrative performance, such as exposure to both languages and proficiency in each language (Kapalková et al., Reference Kapalková, Polišenská, Marková and Fenton2016). Bilinguals may benefit from the interaction of their two languages due to possible cross-linguistic transfer (Hipfner-Boucher et al., Reference Hipfner-Boucher, Lam and Chen2014; Squires et al., Reference Squires, Lugo-Neris, Peña, Bedore, Bohman and Gillam2014), and this interaction has not been completely understood. In addition, the gap between microstructure and macrostructure performance, which is more evident in bilinguals and in children with DLD due to their different levels of proficiency, contributes to variability in the results. Another uniquely bilingual measure that may affect language development in bilingual children is the Age of onset of Bilingualism (AoB), the age a child begins to receive exposure to the second language, which is often the societal language (Armon-Lotem et al., Reference Armon-Lotem, Restrepo, Lipner, Ahituv-Shlomo and Altman2021; Golberg et al., Reference Golberg, Paradis and Crago2008; Paradis et al., Reference Paradis, Soto-Corominas, Daskalaki, Chen and Gottardo2021; Unsworth, Reference Unsworth2013). Later AoB implies longer periods of (mostly) monolingual development in the home, or heritage, language, whereas earlier AoB is associated with simultaneous early development of two systems (Paradis, Reference Paradis2023). While studies agree that exposure measures are important in bilingual typical narrative development, their impact among children with BiDLD is not clear and has not been sufficiently addressed in research on narrative development. The role of AoB in children with DLD is of special importance because these children often have a later onset of speech and slower development (Leonard, Reference Leonard2017). For example, Govindarajan and Paradis (Reference Govindarajan and Paradis2019) found that exposure factors, such as length of exposure and richness of the SL environment, predicted the narrative performance of children with BiTLD, but not that of children with BiDLD.
Narrative intervention in bilingual children with DLD
Children with BiDLD, who speak a home language different from the language of the instruction in preschool/school, usually receive minimal academic support in the home language (Zehler et al., Reference Zehler, Fleischman, Hopstock, Stephenson, Pendzick and Sapru2003). In most cases, intervention is provided only in the societal language, due to the absence of bilingual professionals who can provide treatment in both languages (ASHA, 2010; Jordaan, Reference Jordaan2008) as well as the common practice of providing treatment exclusively in the societal language for social integration and academic success (Capin et al., Reference Capin, Vaughn, Gillam, Fall, Roberts, Israelsen-Augenstein and Gillam2023; Daelman et al., Reference Daelman, Alighieri, Van Lierde, Simon, Altinkamis, Baudonck and D’haeseleer2023; Drysdale et al., Reference Drysdale, van der Meer and Kagohara2015; Kay-Raining Bird et al., Reference Kay-Raining Bird, Lamond and Holden2012). We know of four bilingual narrative intervention studies that have implemented intervention in both languages of children with TLD (Armon-Lotem et al., Reference Armon-Lotem, Rose and Altman2021; Lipner et al., Reference Lipner, Armon-Lotem, Walters and Altman2021; Petersen et al., Reference Petersen, Thompsen, Guiberson and Spencer2016; Spencer et al., Reference Spencer, Petersen, Restrepo, Thompson and Gutierrez Arvizu2019), two bilingual intervention studies with children with DLD (Lugo-Neris et al., Reference Lugo-Neris, Bedore and Pena2015; Thordardottir et al., Reference Thordardottir, Cloutier, Ménard, Pelland-Blais and Rvachew2015), and two systematic reviews (Kk Nair et al., Reference Kk Nair, Clark, Siyambalapitiya and Reuterskiöld2023; Pico et al., Reference Pico, Hessling Prahl, Biel, Peterson, Biel, Woods and Contesse2021). Bilingual intervention studies have reported advantages of bilingual over monolingual intervention (Gutierrez-Clellen et al., Reference Gutierrez-Clellen, Simon-Cereijido and Restrepo2014; Gutierrez-Clellen et al., Reference Gutierrez-Clellen, Simon-Cereijido and Sweet2012; Thordardottir et al., Reference Thordardottir, Cloutier, Ménard, Pelland-Blais and Rvachew2015) and/or evidence of cross-linguistic transfer (e.g., Kambanaros et al., Reference Kambanaros, Michaelides and Grohmann2017; Lü et al., Reference Lü, Pace and Ke2023). Thordardottir et al. (Reference Thordardottir, Cloutier, Ménard, Pelland-Blais and Rvachew2015) examined 29 children from diverse backgrounds in three conditions: monolingual French, bilingual (where home language instruction was provided by parents), or no intervention. The intervention was delivered via narratives targeting vocabulary and syntactic skills. Children’s vocabulary and syntax improved, but only the vocabulary gains were attributed to intervention. The study concluded that the bilingual treatment condition, when conducted through collaboration with parents, was not effective in creating a sufficiently intense bilingual context. Lugo-Neris et al. (Reference Lugo-Neris, Bedore and Pena2015) conducted 24 small-group intervention sessions with Spanish–English children with BiDLD in both languages targeting macrostructure (story grammar) and microstructure (vocabulary and syntax). Gains were reported for macrostructure and vocabulary in both languages.
Research has not sufficiently examined, which SG elements were most susceptible to improvement after the intervention. Intervention studies focusing on SG elements reported that children produced more complete stories including such elements as an Initiating Event, Attempt, and Consequence (Petersen & Spencer, Reference Petersen and Spencer2016). Studies examining macrostructure skills of children with DLD at one point in time have suggested that Goals are acquired later and are more challenging elements to produce (Altman et al., Reference Altman, Armon-Lotem, Fichman and Walters2016). Some studies suggested that children with BiDLD used fewer SG elements overall (Altman et al., Reference Altman, Armon-Lotem, Fichman and Walters2016; Altman et al., Reference Altman, Fichman, Perry, Osher and Walters2024; Fichman et al., Reference Fichman, Altman, Voloskovich, Armon-Lotem and Walters2017). Goals and Internal Responses are less “concrete,” and it may be more challenging to increase their use as a result of intervention (Khan et al., Reference Khan, Gugiu, Justice, Bowles, Skibbe and Piasta2016).
A unique feature of bilingual intervention delivered in two languages is the potential for cross-linguistic transfer of skills from one language to another (Ebert et al., Reference Ebert, Kohnert, Pham, Disher and Payesteh2014; Harvey et al., Reference Harvey, Allaway and Jones2021; Isurin, Reference Isurin2005; Kk Nair et al., Reference Kk Nair, Clark, Siyambalapitiya and Reuterskiöld2023; Petersen et al., Reference Petersen, Thompsen, Guiberson and Spencer2016). Narrative knowledge plays a crucial role in school and enhancing this knowledge via transfer, in addition to the instruction in the school language, may have important educational benefits, especially for children with BiDLD. Methodologically, transfer has been argued for when there was an improvement in the home language in the experimental group following intervention in the societal language, but not in the control group (Petersen et al., Reference Petersen, Thompsen, Guiberson and Spencer2016). Harvey et al. (Reference Harvey, Allaway and Jones2021) showed transfer because of improvement in the SL in the bilingual treatment group which was similar to the improvement in the “SL only” treatment group, despite having had half the intervention time. Lü et al. (Reference Lü, Pace and Ke2023) claimed transfer of definition skills when HL and SL scores significantly correlated. Armon-Lotem et al., (Reference Armon-Lotem, Restrepo, Lipner, Ahituv-Shlomo and Altman2021) and Lipner et al (Reference Lipner, Armon-Lotem, Walters and Altman2021) showed transfer in vocabulary knowledge by examining improvement in HL/SL after intervention in the other language using different set of words in each language (Armon-Lotem et al., Reference Armon-Lotem, Restrepo, Lipner, Ahituv-Shlomo and Altman2021). Lipner et al. (Reference Lipner, Armon-Lotem, Walters and Altman2021) showed bidirectional transfer of lexical knowledge. Based on these studies, cross-linguistic transfer of narrative skills is possible, but empirical evidence is still limited, especially in children with DLD and in the context of narrative intervention focusing on macrostructure and microstructure skills.
In sum, only a handful of studies performed narrative intervention while focusing on macrostructure and microstructure, in both languages. Some studies suggested the possibility of cross-linguistic transfer of macrostructure skills. The present paper examined narrative performance in both languages following two blocks of intervention, first in the home language and then in the societal languages.
The current research
The study examined whether narrative macrostructure and microstructure skills develop in both languages of Russian-Hebrew preschool children with BiDLD and their peers with BiTLD, following a bilingual narrative intervention (BINARI). Russian was children’s Home Language (HL), and Hebrew was their Societal Language (SL). All children were clinically referred to receive treatment by a Speech-Language Pathologist (SLP). The study also addressed a possible cross-linguistic transfer of macrostructure and microstructure skills. The following research questions were addressed.
-
1. Group. To what extent does narrative performance, evaluated using macrostructure and microstructure measures, differ for children with BiDLD and BiTLD at all stages of a bilingual narrative intervention procedure?
-
2. Language. Are there differences in macrostructure and microstructure in BiDLD and BiTLD children’s narratives in HL/Russian vs. SL/Hebrew?
-
3. Within vs. cross-language change. To what extent are there changes in macrostructure and microstructure skills across four PM time points when the language of the intervention and the language of the PM match (within-language) and when the language of the intervention and the PM do not match (across-languages)?
-
4. Exposure. To what extent is the AoB related to macrostructure and microstructure performance of bilingual children with BiDLD and BiTLD?
Bilingual children with DLD are predicted to perform like children with BiTLD for macrostructure features and to show weaker performance for microstructure features (in particular for verbal productivity and morpho-syntax), across all stages of the intervention (e.g., Armon-Lotem et al., Reference Armon-Lotem, Restrepo, Lipner, Ahituv-Shlomo and Altman2020; Iluz-Cohen, & Walters, Reference Iluz-Cohen and Walters2012). Children may perform better in HL/Russian than in SL/Hebrew at all time points (Lipner et al., Reference Lipner, Armon-Lotem, Walters and Altman2021). Bilingual narrative intervention is expected to result in changes within and across languages, such that a change across languages would be compatible with the presence of cross-language transfer (Armon-Lotem et al., Reference Armon-Lotem, Restrepo, Lipner, Ahituv-Shlomo and Altman2021). Children exposed earlier to the SL may show better performance in the HL (Lipner et al., Reference Lipner, Armon-Lotem, Walters and Altman2021).
Method
Participants
Seventeen Russian-Hebrew bilingual children at risk for DLD were included in the study. All children were referred by a professional in the child’s educational setting for treatment to the clinic from which the children were recruited. Referrals were made because of weak performance in SL/Hebrew. Initially, 24 bilingual children were screened in the clinic; four were excluded since they could not carry on a basic conversation in HL/Russian, and at the end of the intervention, three children were excluded due to infrequent attendance. Perents provided writtrn informd consent, and children expressed oral assent. they were informed that they could discontinue particiopation at any time. the study was approved by IRB and the office of the Chef Scientist of the Ministry of Education.
BiDLD criteria
Due to unavailability of bilingual SLPs in Israeli clinics and absence of bilingual standardized assessment tools, children are usually screened in clinics using monolingual standardized tests in SL/Hebrew. As a result, children are often over-diagnosed with DLD, which leads to incorrect placement of children in special education preschools (Abutbul-Oz & Armon-Lotem, Reference Abutbul-Oz and Armon-Lotem2022). In the current research, children’s language abilities were screened in both languages using tests normalized for bilingual children in HL and SL (see Materials section). As a result of the screening and applying bilingual local standards (Altman et al., Reference Altman, Harel, Meir, Iluz-Cohen, Walters and Armon-Lotem2021), eight children performed 1.25SDs below the standard in both languages; these children were classified as children with BiDLD, based on the definition of bilingual DLD (Armon-Lotem, Reference Armon-Lotem2014). Nine children scored 1.25SDs below the mean in SL/Hebrew and 1.25SDs above the mean in HL/Russian and were classified as children with BiTLD for the purpose of the present study. All 17 parents reported that they were concerned about their child’s language skills. Ten of 17 parents (6 DLD and 4 TLD) reported that their child had some kind of treatment in other clinics in SL/Hebrew.
Bilingualism criteria
Children were considered bilingual if they were exposed to HL/Russian from birth in a home where Russian was spoken by at least one caregiver and could carry on a conversation in both languages. The bilingual status was reported by parents (see Materials section) and verified by the Russian-speaking and the Hebrew-speaking experimenters. If a child could not hold a spontaneous conversation in HL or SL, she was excluded from the research. All children had at least 24 months of exposure to Hebrew, and all attended preschools (prior to school entry) where Hebrew was the language of instruction and the main language of social interaction.
The final group of participants included 11 boys and 6 girls; their ages ranged between 59 and 76 months (M = 67.5; SD = 5.03). Children’s AoB and proficiency information are presented in Table 1.
Table 1. Age, AoB, and proficiency scores in children with developmental language disorder (DLD) and typical language development (TLD)

Note: Russian (HL) proficiency was assessed by the Russian Language Proficiency Test for Multilingual Children (Gagarina et al., Reference Gagarina, Klassert and Topaj2010); Hebrew (SL) proficiency was assessed by the Goralnik Screening Test for Hebrew (Goralnik, Reference Goralnik1995); AoB = Age of onset of Bilingualism.
* z-score based on local bilingual standards for Russian-Hebrew bilingual children.
Materials
Background questionnaire
A parental questionnaire was adapted from the Bilingual Parental Questionnaire (Abutbul-Oz & Armon-Lotem, Reference Abutbul-Oz and Armon-Lotem2022), eliciting information about age, developmental milestones, exposure to HL and SL, language preferences, and speech-language treatment.
Language screening
To assess proficiency in Russian, The Russian Language Proficiency Test for Multilingual Children (Gagarina et al., Reference Gagarina, Klassert and Topaj2019) was administered. It included measures of expressive (noun/verb naming, production of case and verb inflections) and receptive language (comprehension of grammatical constructions, nouns, and verbs). Hebrew proficiency was assessed using the Goralnik Screening Test for Hebrew (Goralnik, Reference Goralnik1995). It included six sub-tests: vocabulary, sentence repetition, comprehension, oral expression, pronunciation, and storytelling. Local bilingual standards were applied for scoring (Altman et al., Reference Altman, Harel, Meir, Iluz-Cohen, Walters and Armon-Lotem2021; Armon-Lotem & Meir, Reference Armon-Lotem and Meir2016). Each test had been used with large samples of bilingual children prior to the establishment of the standards (Altman et al., Reference Altman, Harel, Meir, Iluz-Cohen, Walters and Armon-Lotem2021; Fichman et al., Reference Fichman, Altman, Voloskovich, Armon-Lotem and Walters2017; Reference Fichman and Altman2023).
Narrative instrument
Narrative intervention materials were based on Spencer et al. (Reference Spencer, Petersen, Restrepo, Thompson and Gutierrez Arvizu2019) and were adapted culturally to the Israeli bilingual environment (Armon-Lotem et al., Reference Armon-Lotem, Restrepo, Lipner, Ahituv-Shlomo and Altman2021; Lipner et al, Reference Lipner, Armon-Lotem, Walters and Altman2021). Overall, 20 sets of pictures were used, eight sets for Progress Monitoring, (four in Russian and four in Hebrew) and 12 sets for the narrative intervention procedure (six in Russian and six in Hebrew), see the Procedure section. In all testing sessions and in all intervention sessions, different stories were used; however, all the stories were designed to be similar in terms of length, macrostructure, and language complexity. In terms of macrostructure, all stories were based on seven SG elements, appropriate for ages 5–6. The elements were character, problem, feeling, goal, action, ending, and internal response. Each set had a narrative script. The scripts were similar in terms of lexical and syntactic complexity, such that they included four subordinate clauses with temporal pronouns (e.g., “After Yael reached behind her coat, she found her hidden scarf”) and causal connectors (e.g., “Yael was frustrated because her red scarf was hidden”). All stories were based on themes relevant for the age of 5–6, such as addressing the difficulty of reaching a book on a shelf or looking for a lost thing.
Procedure
Research design
The study employed a single arm within-subject pre-post design, where the main goal was to monitor the progress in narrative skills in both languages of clinically referred bilingual children classified as BiDLD and BiTLD. The study did not use a randomized controlled trial design, primarily because this was not ethically appropriate in a setting where the participants were clinically referred. All children who are referred to the clinic receive treatment, and if we were to give a different language treatment to a group of children, they could not serve as a control group. Thus, the most appropriate research design was the single arm within-subject pre-post design, where children served as their own control, both within and across languages. Such a design is appropriate to implement in a kindergarten or a clinic (Green & Klecan-Aker, Reference Green and Klecan-Aker2012).
Screening tests in HL and SL yielded two groups, with children performing below age-appropriate norms in both languages being assigned to the BiDLD group and children performing below age-appropriate norms in one of the languages to the BiTLD group. This allowed us to address the first research question comparing macrostructure and microstructure performance of children from each of the two groups across four-time points.
The comparison across languages in both groups allowed us to examine differences between HL and SL in terms of narrative performance at four-time points, which was at the center of the second research question. Two blocks of intervention were conducted: the first block was conducted in HL, and the second block in SL. Narrative skills were tested before and after each block in both languages, allowing us to establish whether narrative performance changed in the language of the intervention (within-language change) or in both languages (potential cross-linguistic transfer). This design aimed to address the third research question.
The overall procedure spanned over 12–16 weeks and is presented in the schematic outline of BIlingual NARrative Intervention (BINARI).
Prior to the intervention sessions, two language screening sessions were conducted in HL and SL over two weeks, with a week gap between them. The program included four progress monitoring (PM) sessions in each language (eight PM sessions), which were conducted before intervention, after intervention in HL, after intervention in SL, and six weeks after intervention was over. There were 12 intervention sessions (six in HL and six in SL), such that the six intervention sessions (first in HL and then in SL) were conducted over three weeks in each language. Thus, each participant participated in 22 sessions which included two screening sessions, eight PM sessions, and 12 intervention sessions.
Progress monitoring (PM)
Participants told narratives using a retelling mode, one in each language (HL and SL) at each of the four-time points, as follows: PM1 was aimed to provide pre-intervention (baseline) performance; PM2 followed the intervention in HL/Russian and was conducted in HL and SL; PM3 was conducted in both languages following the intervention in SL/Hebrew; and in PM4 children told stories in each language six weeks after the last intervention session. The elicitation materials (story scripts and pictures) used for retelling in the PMs were single-episode stories, culturally adapted from Puente de Cuentos (Spencer et al., Reference Spencer, Peterson and Restrepo2017). The stimulus stories were based on Stein and Glenn’s (Reference Stein, Glenn and Freedle1979) story grammar model. The HL/Russian stories had an average of 71 words, and the SL/Hebrew had an average of 68 words. The narratives in both languages were matched for plot structure, syntactic complexity, and mental state terms.
In each PM session, the child first looked at the pictures while the experimenter told the story. Next, the children were asked to retell the story with the pictures laid out in front of them. The experimenters were trained by the second and fourth authors and were instructed not to interfere with the child’s storytelling. If the child hesitated while retelling the story, the child was encouraged to continue, but no verbal prompts were given. The PM sessions were conducted individually in the clinic.
Intervention
Intervention was conducted in small groups of 3–4 children in the clinic. It consisted of two blocks of six sessions each, one block in HL (Russian) and the other in SL (Hebrew) for a total of 12 sessions with 12 different stories, all with the same structural features as those used for the PM sessions. Intervention sessions were conducted twice a week, and each language block lasted three weeks. Each intervention session lasted 20–25 minutes. Sessions in each language were conducted by a native speaker of the language. A student of Speech-Language Pathology conducted the intervention in Russian, and a certified SLP conducted interventions in Hebrew (the first author). The intervention procedure consisted of five steps: 1) story modeling and introduction of icons for targeted elements; 2) pairing icons with gestures; 3) group retelling; 4) explicit focus on target features (for macrostructure: character, problem, feeling, goal, attempt, outcome, and internal response; for microstructure: productivity, accuracy, and complexity); and 5) individual retelling. For each intervention session, the experimenter first read the story, while the children were looking at the pictures. Then, the experimenter explained each of the seven narrative elements (SG elements): character, problem, feeling, goal, attempt, outcome, or internal response. The explanation involved defining each SG element, showing an icon for that element, and demonstrating a gesture associated with each icon (Spencer et al., Reference Spencer, Petersen, Restrepo, Thompson and Gutierrez Arvizu2019). The icons and gestures were used throughout all intervention sessions. During intervention, the children were asked to repeat parts of the story together with the experimenter, which yielded 4–6 opportunities for each child to participate and tell their part of the narrative during a given session. To provide further support, the examiner gave verbal encouragement, if necessary. The procedure encouraged group interaction and elaboration, and at the same time allowed individual participation of each child.
The intervention took place in a quiet area of the clinic. The project was conducted in 2022, and parts of the intervention overlapped with periods of isolation/lockdown due to COVID-19. During such periods, the experimenter completed the missing intervention sessions individually immediately following isolation (5 individuals in Hebrew and 9 individuals in Russian). Between PM2 and PM3 there was a national lockdown, and children had no exposure to SL/Hebrew at preschool.
Fidelity of intervention
Experimenters who administered the intervention were trained by two of the authors. A step-by-step protocol for each session was created for each language and piloted prior to conducting the study (Lipner et al., Reference Lipner, Armon-Lotem, Restrepo and Altman2019). Each experimenter was given identical folders with printed materials for each intervention session. The experimenter first checked that all materials were present prior to an intervention session. After each intervention session, the experimenter made sure everything on the fidelity checklist was covered. The checklist included i) a list of all tasks needed to be performed and their order; and ii) a list of responses to be elicited from each child.
Coding
Narratives from all four PM sessions in both languages were transcribed and coded by independent research assistants trained to transcribe using CHAT conventions (MacWhinney, Reference MacWhinney2000) and who were ignorant of participants’ clinical status. The division into utterances was based on C-units, where each C-unit consisted of one main clause or a main clause with a single subordinate clause (Hunt, Reference Hunt1970; Loban, Reference Loban1976). Each transcribed narrative was then coded for macrostructure and microstructure elements.
Coding for macrostructure included examining each utterance for whether the child produced one of the SG elements: character, problem, feeling, goal, attempt, outcome, or internal response. The total macrostructure score was the sum of the responses rated on a three-point scale, as follows: 2 points for accurate production of an SG element, 1 point for a general description of the element, and no points if the element was not produced at all. This yielded a maximum score of 14 for each child for each PM story. A second coding system was used for one of the analyses, where the production of each element was assigned a binary coding, that is, 1 point if the child produced an element, and 0 points if the child did not mention the element.
Microstructure measures were: total number of word tokens (TW), number of different words (NDW), C-unit complexity (percent of complex C-Units), and C-unit accuracy (percent of errorless C-units). A complex utterance was defined as an utterance including two clauses connected via coordination or subordination.
Data analyses
All statistical analyses were conducted in R, version 4.0.5 (R Core Team, 2017). Descriptive statistics are reported for all measures first. To test which factors contributed to the macrostructure and the microstructure measures, linear mixed models (LMM) analyses were performed. LMM analyses were used because of the repeated design of the study, where each participant was tested in four PMs and in both languages. LMM is recommended for clinical intervention studies, since inclusion of random effects allows for control for individual variation (Wiley & Rapp, Reference Wiley and Rapp2019). Moreover, there is a consensus that there is high variability in the language abilities of children with DLD, and including the random factor partially addresses this challenge. Analysis of each measure began with the null model, which included only random effects, while each subsequent model tested fixed factors and their interactions. Likelihood ratio tests were used to compare two consecutive models by calculating likelihoods for two models, using maximum likelihood estimation, and then statistically comparing those likelihoods which establish significance of each fixed factor (Brown, Reference Brown2021). A model including a factor of interest (e.g., Group) was compared with a model lacking that factor (i.e., only with the random factor). All models included a by-Participant random factor.
For macrostructure, two analyses were performed. First, to test factors predicting the total macrostructure score, we ran an LMM analysis with the fixed factors Group (BiDLD/BiTLD), Language (HL/SL), PM (1/2/3/4), AoB (a continuous measure), and their interactions. The dependent variable was total macrostructure score with the maximum value of 14. Second, to test the probability of producing each macrostructure element (character, problem, feeling, goal, action, ending, and internal response), a generalized linear mixed model (GLMM) with binomial distribution was used, where each element had a binary coding (1-produced, 0-not produced).
For microstructure, separate LMM models were built for the following measures: TW, NDW, C-units, Complexity (percent of complex C-units), and Accuracy (percent of errorless C-units). In each analysis, the fixed factors were Group (BiDLD/BiTLD), Language (HL/SL), PM (1/2/3/4), AoB (a continuous measure), and their interactions.
In the analyses, a main effect for Group would indicate a difference in performance between children with BiDLD and BiTLD. A main effect for Language would reveal differences between HL and SL, collapsed across Groups for all four PMs. An effect for PM would indicate changes in both languages across the four-time points. Since the first block of intervention was conducted only in HL and the second block only in SL, improvement in a language that was not the target of the intervention would be interpreted as evidence for transfer. The interaction of Language*PM would show that the change is stronger in one language than the other.
LMM analyses were run using the lmer function from the lmerTest package (Kuznetsova et al., Reference Kuznetsova, Brockhoff and Christensen2017). Likelihood-ratio tests determined the significance of each factor by comparing two models using AIC, BIC, logLik, deviance, Chi-square, and p-values. Post-hoc tests were run using contrasts from the emmeans package (Lenth, Reference Lenth2019). Predicted values were plotted using the ggplot function from the ggplot2 package (Wickham, Reference Wickham2016) or the plot_model function from the sjPlot package (Lüdecke, Reference Lüdecke2021). When plotting predicted values or predicted probabilities with two categorical predictors (Figures 1–4), dots represent the mean predicted scores/probabilities, and error bars represent a range of values around the mean. Figures 5 and 6 display predicted values (lines) and the confidence intervals (colored areas) across the continuous variable (AoB) plotted on the X-axis.

Figure 1. Predicted total macrostructure score across four progress monitorings in HL/Russian and SL/Hebrew.

Figure 2. Predicted total macrostructure score for HL/Russian and SL/Hebrew as a function of Age of onset of Bilingualism.

Figure 3. Predicted probabilities of producing each SG element in HL/Russian and SL/Hebrew across four PMs. Note: IE = Initiating Event; IR = Internal Response; PM = Progress Monitoring.

Figure 4. Predicted number of different words scores by Group and Language across four progress monitorings.

Figure 5. Predicted percentage of errors by Group and Language across four progress monitorings.

Figure 6. Predicted percentage of errors for HL/Russian and SL/Hebrew as a function of Age of onset of Bilingualism.
Results
Findings for total macrostructure scores for the two groups (BiDLD and BiTLD) in both languages for each of the four PMs are presented first, followed by the performance on each macrostructure element. Next, microstructure measures of productivity (TW, NDW, and number of C-units), complexity, and accuracy for each language are presented. Findings for the impact of AoB and transfer effects are integrated into each analysis.
Total macrostructure score
Table 2 displays the total macrostructure scores for each of the four PMs in the two languages for children with BiDLD and BiTLD.
Table 2. Total macrostructure score (Means and SDs) for each of four PMs for children with bilingual developmental language disorder (BiDLD) and bilingual typical language development (BiTLD)

Note: PM = Progress Monitoring.
Table 2 shows that children with BiDLD had lower scores than children with BiTLD at all four PMs and that the performance of all children was better in HL/Russian than in SL/Hebrew. To test whether there were significant differences in the performance of children with BiDLD and children with BiTLD across PMs in HL/Russian and SL/Hebrew, LMM models with the following fixed factors were tested in this order: Group, Language, PM, AoB, and their interactions. The analyses included by-Participant random intercepts. The following factors emerged as significant: Language, χ2 = 52.16, p < .001; PM, χ2 = 8.69, p = .03, and the Language*AoB interaction, χ2 = 7.62, p = .02. Table A1 in the Appendix includes the results of Likelihood Ratio tests. The final optimal model included the fixed factors of Language, PM, Language*AoB, and the random factor. The variance explained by the fixed factors was 35%, and another 20% was explained by the random factor. For the Language factor, HL/Russian had higher macrostructure scores than SL/Hebrew. For the difference between PMs, reverse Helmert coding was used for pairwise comparisons, since this coding allows comparison of each PM level with the mean of the previous level(s). The analysis revealed that the mean for PM2 was higher and was statistically significant compared to PM1 (p = .03), but PM3 did not differ statistically from the mean for either PM1 and PM2 (p = .87), and PM4 did not differ significantly from the mean of all previous PMs (p = .37). Cohen’s d was calculated to evaluate the effect size of the change; it was 0.5 indicating a medium effect size. This was found for both languages. Figure 1 plots the predicted total macrostructure score for the four PMs in HL/Russian and SL/Hebrew.
As shown in Figure 1, children’s scores in both groups were higher in PM2 than in PM1, and scores were higher in HL/Russian than in SL/Hebrew across all PMs. In addition, the scores for HL/Russian (red) were higher than the scores for SL/Hebrew (blue) across all four PMs.
To interpret the Language by AoB interaction, the predicted macrostructure scores for Russian and Hebrew were plotted as a function of AoB, collapsed across the four PMs (Figure 5).
As seen in Figure 5, the total predicted macrostructure score decreased sharply for SL/Hebrew as a function of AoB, but not for HL/Russian, where the decrease was negligible. In other words, later AoB was associated with lower predicted macrostructure score in SL/Hebrew, whereas but not related to predicted macrostructure score in HL/Russian.
Story grammar elements
To test the probabilities of producing each SG element, a GLMM analysis with binomial distribution was conducted. The analyses included random by-Participant intercepts and random slopes by macrostructure element. All fixed factors in the order tested and the results of likelihood ratio tests are presented in Table A2 in the Appendix.
The following fixed factors were significant: Group, χ2 = 5.47, p = .02, Language, χ2 = 54.34, p < .001, PM, χ2 = 15.96, p = .001, and SG Element, χ2 = 35.73, p < .001. For Group, there were lower probabilities for children with BiDLD to produce elements. The Language and the PM effects confirmed the finding reported for the total macrostructure score. For the effect of Element, Feelings, Goals, and IR were the two elements with the lowest probability to be produced (p < .001). The final optimal model explained 42% of variance by the fixed factors and 16% by the random factors. Like the results reported above for the total macrostructure score, HL/Russian performance was significantly better than SL/Hebrew at all four PMs, and the BiTLD group performed better than the BiDLD group in both languages.
To explore the effect of SG elements across languages for the two groups, we plotted the probabilities for using each element (Figure 2).
Macrostructure analysis showed BiDLD-BiTLD similarity for the total score. However, group differences emerged for individual SG elements. Performance was better in HL/Russian than in SL/Hebrew for both groups. Children in both the BiDLD and BiTLD groups significantly improved in both languages following intervention in HL/Russian (at PM2) but with no subsequent increase in scores at PM3 and PM4. In addition, there was an increase in macrostructure elements in SL/Hebrew, in particular for Feeling and Goal, which were very low initially and showed better performance across the four PMs. An AoB effect emerged in SL/Hebrew in both groups, where the younger the onset of bilingualism, the higher the predicted total macrostructure score in Hebrew.
Microstructure
Narrative microstructure analysis included five measures, three for productivity (Total Words, Number of Different Words, and number of C-units), one measure of Complexity (percent of complex C-units), and one measure of Accuracy (percent of accurate C-units).
Productivity
Table 3 summarizes the descriptive statistics (means and standard deviations) for the productivity measures in HL/Russian and SL/Hebrew for the two groups (BiDLD/BiTLD) across four PMs.
Table 3. Descriptive statistics (means and standard deviations) for the productivity measures in HL/Russian and SL/Hebrew for the two groups (bilingual developmental language disorder/bilingual typical language development [BiDLD/BiTLD]) across four PMs

To test which factors contributed to productivity measures, LMM analyses were applied to each measure with the following fixed factors: Group, Language, PM, and AoB, as well as interactions among these factors. The analyses included by-Participant random intercepts and by-Story random slopes. For all three measures, the models with random slopes failed to converge, and only the random intercepts were tested. Table A3 in the Appendix shows all the fixed factors, in the order they were tested and the results of the likelihood ratio tests.
For TW, Group, χ2 = 6.26, p = .01, and AoB, χ2 = 4.33, p = .04, were significant. The variance explained by the fixed factors in the final model was 18% with another 10% explained by the random factor. For Group, children with BiDLD produced shorter narratives than children with BiTLD at all PMs. Later AoB was associated with lower TW. This finding held for both languages, since the Language*AoB interaction was not significant.
For NDW, Group, χ2 = 5.85, p = .02, Language, χ2 = 14.70, p < .001, PM, χ2 = 10.64, p = .01, AoB, χ2 = 6.39, p = .01, and the Language*AoB interaction, χ2 = 3.76, p = .05, were significant. The final model including these factors explained 54% of the variance, 35% by the fixed factors, and 19% by the random factor. The Language effect showed that greater lexical diversity (NDW) was found in the HL/Russian narratives than in SL/Hebrew narratives. For the difference across PMs in terms of NDW, we used a reversed Helmert coding which revealed that the mean of PM2 was higher and differed statistically from PM1 (p = .01); the difference between PM3 and the mean of PM1 and PM2 missed significance (p = .06); PM4 did not differ significantly from the mean of all previous PMs (p = .65). Cohen’s d for the PM1-PM2 difference was 1.04, which represents a large effect size. This was predicted for both languages. For Group, the narratives of children with BiTLD were more lexically diverse than those of children with BiDLD. AoB was also a significant predictor of lexical diversity. Figure 3 graphically displays the differences across the four PMs and the effects of Language and Group on the predicted NDW scores.
Figure 3 shows greater NDW in PM2 compared to PM1 for both BiDLD and BiTLD groups and in both languages (HL/SL). Both groups improved from PM1 to PM2 in both languages, even though the BiDLD group did not perform as well as the BiTLD group, and HL/Russian showed higher predicted NDW values than SL/Hebrew. Improvement from PM1 to PM2 was similar in both languages for the two groups, which is supported by a lack of significant interactions between the fixed factors.
For C-units, none of the fixed factors were significant.
Complexity and accuracy
Table 4 displays the Means and SDs for Complexity (ratio of complex C-units) and Accuracy (ratio of C-units containing errors).
Table 4. Means and SDs for Complexity (ratio of complex C-units) and Accuracy (ratio of C-units containing errors)

Note: PM = Progress Monitoring.
To test the effects of the fixed factors on Complexity and Accuracy, LMM analyses were performed. Table A4 in the Appendix presents the fixed factors in the order they were entered into the models. For Complexity, none of the fixed factors was significant. For Accuracy, Language, χ2 = 12.01, p = .001, Group*Language, χ2 = 14.88, p = .001, Language*PM, χ2 = 14.07, p = .03, and Language*AoB, χ2 = 6.82, p =.03 were significant. The best-fit model for Accuracy included these factors, which explained 27% of the variance, and the random factors explained an additional 14% of variance. Children performed better (fewer errors) in HL/Russian than in SL/Hebrew; however, a significant Group*Language interaction showed that the gap between children with BiDLD and their peers with BiTLD was wider in Russian than in Hebrew, as seen in Figure 4.
Figure 4 shows that children with BiDLD (purple) had more errors than children with BiTLD (green). A post-hoc analysis with Tukey corrections revealed that in HL/Russian, children with BiDLD had a significantly higher percent of errors than children with BiTLD (p = .03), but the group difference was not significant in SL/Hebrew (p = .20). A post-hoc analysis for the Language*PM interaction, applying Tukey corrections, revealed that in Hebrew, children in both groups produced more errors in PM3 compared to PM1 (p = .01) and more errors compared to PM2 (p = .01), also visible in Figure 4; in Russian, there were no significant differences across PMs.
Figure 6 displays the interaction of Language*AoB.
Like the results for NDW, the curve reflecting SL/Hebrew (blue) shows that the predicted percentage of errors in C-units rises as a function of AoB. In other words, the later the AoB, the higher the percentage of errors in Hebrew. In contrast, in HL/Russian, later AoB is not associated with a higher percentage of C-units with errors, as reflected by the relatively flat curve across AoB time intervals.
Discussion
The primary interest of this study was to examine changes in macrostructure and microstructure skills associated with BIlingual NARrative Intervention (BINARI) at four time points in BiDLD and BiTLD Russian-Hebrew preschool children’s narrative performance in both their languages. For macrostructure, the total score as well as specific macrostructure elements (especially Feeling and Goal) showed improvement for both groups after the first block of intervention in HL/Russian, but remained stable, and even decreased somewhat after the second block of intervention in SL/Hebrew. All children performed better in HL than in SL across all four time points. Earlier AoB was found to be related to better macrostructure performance only in SL/Hebrew and not in HL/Russian. Analyses conducted at the level of individual SG elements showed that children with BiDLD performed significantly lower than children with BiTLD. Probability plots for the optimal model revealed that group differences were prominent for most elements, particularly for Character and Goal (Figure 2). For microstructure, children with BiTLD produced longer narratives (more TW) and had higher lexical diversity (greater NDW) than children with BiDLD. Of the five microstructure measures examined (TW, NDW, C-units, Complexity, and Accuracy), only lexical diversity (NDW) showed significant improvement in both languages following the intervention in HL/Russian. For TW, the effect of AoB (but not the Language*AoB interaction) was significant. Children produced higher lexical diversity (NDW) in HL than in SL. For Accuracy, children with BiDLD had a significantly higher percentage of errors in HL/Russian than children with BiTLD, but the difference was not significant in SL/Hebrew. In SL/Hebrew, children had more errors in PM3 compared to PM1 and PM2.
Bilingual narrative performance reflects an interplay of child-internal factors, such as relative proficiency in the two languages, clinical status of the child (BiDLD or BiTLD), and child-external factors, such as the order of intervention (HL first/SL first) and exposure. This multitude of factors creates a challenge in understanding the mechanism of narrative development in HL and SL. The present research addressed some of these factors, in particular the child’s clinical status (BiDLD vs. BiTLD), changes in narrative within and across languages, and AoB, offering preliminary answers related to the framework of intervention in both languages among bilingual preschool children.
Clinical status and development of narrative skills
Macrostructure
Findings showed that whereas children with BiDLD showed lower narrative performance, their rate of improvement was similar to children with BiTLD, as evidenced by the increase in the total macrostructure score between PM1 and PM2. Bilingual intervention, using a variety of procedures involving icons/gestures, multiple repetitions, peer interaction, and multi-modal processing, created the environment for narrative skills to develop. The Group differences here support previous studies, which have shown lower macrostructure scores for children with BiDLD than for children with BiTLD (e.g., Boerma et al., Reference Boerma, Leseman, Timmermeister, Wijnen and Blom2016 for Dutch bilingual children; Paradis et al., Reference Paradis, Schneider and Sorenson Duncan2013 for varied HLs with L2/English). The two groups do, however, show similarity in the trajectory of their improvement, as evidenced by the fact that there was no interaction between Group and PM.
The bilingual children in the present study were recruited from among children who were clinically referred, largely based on their weak performance in the SL/Hebrew. Evidence-based practices call for evaluation in both languages of bilinguals (ASHA, 2019). Following screening and division into two groups, children with BiDLD were found to perform worse than those with BiTLD, but despite this difference, they showed a parallel trajectory across the four PM time points. Examining a range of macrostructure and microstructure abilities was found to be important in order to go beyond previous research which dichotomized BiDLD-BiTLD differences as similar in macrostructure but different in microstructure. The present study documents group differences at the level of SG elements and lexical diversity (NDW). Bilingual assessment and intervention enabled us to show that children with BiDLD are expected to progress along the same trajectory as children with BiTLD as well as to identify particular features of macrostructure and microstructure that can be targeted in future interventions.
Macrostructure is grounded in the use of specific elements, all of which contribute to story’s coherence. Examining narrative performance at the level of SG elements was crucial to have a more detailed understanding of children’s performance. For the total macrostructure score, the LMM analysis explained 35% of the variance by the fixed factors and 20% by the random factor (55% overall). We then conducted a GLMM analysis with the production of each SG element as the response variable, and it explained 42% of the variance by the fixed factors and 16% by the random factor (58% overall). In the LMM, the fixed factors were Language, PM, and Language*AoB, and in the GLMM, they were Group, Language, PM, and Element. Thus, the GLMM analysis was able to explain a greater amount of variance by the predictors than the LMM. This is reflected in the higher variance explained by the fixed factors in the GLMM. It is possible that some of the variance attributed to individual variation in the LMM (i.e., the random factor) is explained in the GLMM by the fixed factor. This is seen in the lower variance explained by the random factor in GLMM. The need to consider individual SG elements was also evident in Figure 2 displaying predicted probabilities of each element by Group and Language.
Microstructure
The narratives of children with BiDLD contained fewer words, were less lexically diverse, and contained fewer complex sentences than the narratives of the BiTLD group. Moreover, a significant Group by Language interaction emerged for morphosyntactic accuracy, where the gap between children with BiDLD and their peers with BiTLD was wider in Russian than in Hebrew. Children with BiDLD had more morphosyntactic errors than children with BiTLD in HL/Russian, but the difference between the groups was not significant in SL/Hebrew. This finding is not surprising, since there is a general consensus that children with DLD have weaker microstructure abilities (Altman et al., Reference Altman, Armon-Lotem, Fichman and Walters2016; Iluz-Cohen & Walters, Reference Iluz-Cohen and Walters2012; Rezzonico et al., Reference Rezzonico, Chen, Cleave, Greenberg, Hipfner-Boucher, Johnson, Milburn, Pelletier, Weitzman and Girolametto2015; Squires et al., Reference Squires, Lugo-Neris, Peña, Bedore, Bohman and Gillam2014; Tsimpli et al., Reference Tsimpli, Peristeri and Andreou2016). With respect to accuracy, the group difference was due to a lower percentage of errors in the HL/Russian for children with BiDLD, resulting in higher HL proficiency. Both groups were clinically referred because of their weak performance in SL/Hebrew, and only the BiDLD group showed low proficiency scores in both languages, which shows that at the age of 5–6, performance in HL is critical for distinguishing typical development and DLD.
Two results are in need of explanation. First, the total macrostructure score did not improve after the second block of intervention in SL/Hebrew. One possible explanation for this finding is that the children may have needed more intervention sessions in SL/Hebrew (more than in HL/Russian) in order to demonstrate improvement. Since the children were familiar with storytelling in HL/Russian from home, six intervention sessions may have been sufficient to show improvement, given their stronger proficiency in that language. In SL/Hebrew, however, more sessions may have been needed due to lower vocabulary levels and less exposure (Armon-Lotem et al., Reference Armon-Lotem, Restrepo, Lipner, Ahituv-Shlomo and Altman2021). The second finding requiring explanation was that in SL/Hebrew, children had more errors in PM3 than in the previous PMs. During the intervention, in particular between PM2 and PM3, the COVID-19 pandemic led to a national lockdown leaving children at home and not exposed to Hebrew in preschool. Since the children came from HL/Russian dominant homes, the only exposure to SL/Hebrew they received during this period was from the current research involving narrative intervention. Bao et al. (Reference Bao, Qu, Zhang and Hogan2020) reported that kindergarten children lost 67% of their literacy abilities during lockdown. Thus, the increase in errors may be a result of the lockdown. Finally, the unexpected result for TW, where the effect of AoB (but not the Language*AoB interaction) was significant, indicated that later exposure to both languages was related to lower TW.
Enhanced performance for both macrostructure and microstructure raises a question about the connection between macrostructure and microstructure abilities. Macrostructure elements provide scaffolding for story coherence, while lexis and morphosyntax are essential to produce a well-structured story. The question becomes: How can weak microstructure abilities found among children with DLD be bootstrapped onto relatively intact macrostructure? Two features of the intervention implemented in the present study facilitated this process: repetition and variation. Each child produced 5–6 repetitions of a story in each session. In addition, the intervention involved elaborating content about characters and their feelings, which were modeled by the experimenter as well as other children in the group. For description and elaboration of story elements, the need to diversify vocabulary arose subsequently, pointing to an interplay between macrostructure story elements and vocabulary.
Narrative development across languages
The findings documenting cross-language transfer for both BiDLD and BiTLD children contrast with those reported by Petersen et al. (Reference Petersen, Thompsen, Guiberson and Spencer2016) who found cross-linguistic transfer only for children with TLD. The Petersen et al. (Reference Petersen, Thompsen, Guiberson and Spencer2016) study delivered intervention only in SL/English and reported SL-to-HL transfer. The current findings are more in line with the few studies claiming that when intervention begins with the HL, it creates optimal conditions for transfer, especially for children with BiDLD (Lugo-Neris et al., Reference Lugo-Neris, Bedore and Pena2015; Thordardottir, Reference Thordardottir2010). Transfer of macrostructure skills after the first block of intervention in HL can be explained by evidence that story structure is shared across languages from similar cultural backgrounds (Berman & Slobin, Reference Berman and Slobin1994; Squires et al., Reference Squires, Lugo-Neris, Peña, Bedore, Bohman and Gillam2014), and both Russian and Hebrew storytelling has been shown to follow similar patterns (Altman et al., Reference Altman, Armon-Lotem, Fichman and Walters2016).
Improvement following intervention resulted in lexical diversity (NDW) for each of the four monitoring points, showing transfer from HL to SL. In other words, following intervention in HL/Russian, we find higher lexical diversity in SL/Hebrew. The intervention implicitly trained for lexical diversity, teaching children to talk about narrative elements they had less experience with, e.g., characters’ feelings, goals, and internal responses. For this, they needed to use new and different words. Theoretically, the high degree of imitation/repetition and variation engendered by the intervention procedures is grounded in Walters (Reference Walters2005) model of bilingual production, which draws widely from psychology (e.g. William James’ Reference James1890 ‘consistent ends by variable means’; Clark’s Reference Clark and MacWhinney1987 ‘principle of contrast’) and linguistics (e.g., Tannen’s Reference Tannen1989 five functions of repetition in conversation: fluency, to facilitate comprehension, as connective links), some of which equally apply to narrative.
Establishing a cross-linguistic transfer effect is a challenge because it is difficult to disentangle the effects of time and maturation and is especially challenging due to the absence of a control condition. However, the modest results in the current study regarding transfer may have important educational implications, since in the present study following intervention in HL there was improvement in both languages (Restrepo et al., Reference Restrepo, Morgan and Thompson2013). Treatment is usually conducted in the SL—because of the SLPs’ training and because of the lack of an explicitly bilingual policy in treating BiDLD (Abutbul-Oz & Armon-Lotem, Reference Abutbul-Oz and Armon-Lotem2022). The current intervention procedure offers a uniquely bilingual perspective on the assessment and treatment of children with BiDLD by inviting educators and SLPs to integrate both languages. The optimal balance and order of the two languages are yet to be researched; the current results suggest that beginning the intervention with the HL may lead to positive changes in SL.
Age of onset of Bilingualism and narrative development
The analyses conducted in the current study found three significant interactions involving Language and AoB (for the total macrostructure score, for lexical diversity (NDW), and for morphosyntactic errors (Accuracy). For all three interactions, AoB showed differences in SL and not in HL. Specifically, the effect was negative for the total macrostructure score and NDW (later AoB was related to lower scores on these measures in SL) and was positive for Accuracy (later AoB was associated with fewer errors). Thus, in all three analyses, later AoB was associated with enhanced performance in macrostructure and microstructure skills. These findings for the impact of AoB on the SL conform to the adage in other areas of language acquisition that ‘earlier is better.’ One finding, however, contrasts with the described trends; a significant and positive relationship was found between AoB and the total number of words in both languages. This indicates that later AoB was associated with fewer total words in HL and SL. One possible explanation is that the productivity measure of TW does not necessarily reflect the proficiency, and NDW is a better measure of narrative microstructure.
The other part of the interaction documented here, i.e. that AoB had almost no effect on the HL/home language, implies that narrative macrostructure skills as well as lexical diversity and morphosyntactic accuracy in HL are acquired relatively early, remain stable, and do not interact with (lack of) exposure to the SL. This result challenges studies showing mixed findings for the effects of AoB on HL vocabulary, morphosyntax, and syntax. Those studies focusing on morphosyntax generally show positive effects of AoB (Albirini, Reference Albirini2018; Meir et al., Reference Meir, Walters and Armon-Lotem2017; Montrul, Reference Montrul2008, Reference Montrul, Nicoladis and Montanari2016; Soto-Corominas, Reference Soto-Corominas, Fábregas, Acedo-Matellán, Armstrong, Cuervo and Pujol Payet2021), whereas studies of SL syntax give a mixed picture: no effects of AoB or negative effects (Roesch & Chondrogianni, Reference Roesch and Chondrogianni2016), depending on the syntactic structure examined and the population tested (Chiat et al., Reference Chiat, Armon-Lotem, Marinis, Polišenská, Roy, Seeff-Gabriel, Gathercole and Gathercole2013; Meir et al., Reference Meir, Walters and Armon-Lotem2017; Kaltsa et al., Reference Kaltsa, Prentza, Papadopoulou and Tsimpli2020). Thus, at least for Russian-Hebrew bilingual children in Israel, early AoB (exposure to the SL) results in better macrostructure and microstructure performance without negatively interacting with HL abilities.
Limitations and future research
The design of the bilingual intervention in the present study included a block of HL intervention sessions followed by a block of SL sessions. This design was adopted since previous studies showed benefits of beginning in the child’s HL. In order to evaluate whether this design is indeed beneficial for bilingual children, future research should compare two intervention orders, HL first-SL second vs. SL first-HL second, and employ a multiple-baseline design. A more focused suggestion based on the findings for SG elements would be to construct intervention sessions targeting Feelings and Goals, found to be those most vulnerable in the stories of children with BiDLD.
One obvious limitation of the present study was that it could not use a control group, as explained above. For that reason, we employed a within-subject design and focused on changes in narrative performance across four points prior to, during, and following intervention. The small sample size (n = 17) is another limitation, resulting in reduced statistical power.
Finally, the present study used single-episode stories for both intervention and progress monitoring. Short stories are designed to make narration easier for young children and for those with DLD, since they constrain the number of opportunities to convey macrostructure elements (each story contains seven possible elements). Future studies should use longer stories in order to elicit greater variability.
Conclusions and implications
The results of this study suggest that intervention in both languages can be effective for children with BiDLD as well as those with BiTLD. Children with BiDLD perform poorer than children defined as BiTLD, but their improvement from the intervention followed the same trajectory as children without impairment. In addition to improvement for both groups, the bilingual intervention showed benefits in both languages as well as transfer of narrative skills from HL to SL. Speculating about what else may have contributed to the beneficial impact of the intervention, the social scaffolding that the intervention procedure offered is one candidate. Children participated in small groups, with a great deal of interaction among peers as well as with the adult research assistant. This procedure created a socio-pragmatic context involving turn-taking and becoming part of a supportive peer group. Furthermore, the repetitive use of icons and gestures for the targeted elements that accompanied the story facilitated learning across multiple modalities, viz. auditory, visual, and kinesthetic. These multisensory channels may be appropriate for children with BiDLD as they have been shown to be for children with dyslexia and for adolescents at risk for foreign language learning.
Funding statement
This research was funded by an ISF grant 1716/19; PIs: Altman & Walters.
Appendix
Table A1. Results of likelihood ratio tests predicting total macrostructure score

Table A2. Results of likelihood ratio tests predicting probability of producing Story Grammar elements

Note: Element = Story Grammar Element.
Table A3. Results of likelihood ratio tests predicting three microstructure measures (TW, NDW, C-units)

Note: TW- Total words; NDW- number of different words.
Table A4. Results of likelihood ratio tests predicting complexity and accuracy
