Hostname: page-component-5cf477f64f-xc2pj Total loading time: 0 Render date: 2025-04-04T04:57:35.031Z Has data issue: false hasContentIssue false

Prosody and head gestures as markers of information status in French as a native and foreign language

Published online by Cambridge University Press:  31 March 2025

Florence Baills*
Affiliation:
Universitat de Lleida, IfL-Phonetik, University of Cologne, Spain IfL-Phonetik, University of Cologne, Germany
Stefan Baumann
Affiliation:
IfL-Phonetik, University of Cologne, Germany
*
Corresponding author: Florence Baills; Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Prosody and gesture are two known cues for expressing information structure by emphasising new or important elements in spoken discourse while attenuating given information. Applying this potentially multimodal form-meaning mapping to a foreign language may be difficult for learners. This study investigates how native speakers and language learners use prosodic prominence and head gestures to differentiate levels of givenness.

Twenty-five Catalan learners of French and 19 native French speakers were video-recorded during a short spontaneous narrative task. Participants’ oral productions were annotated for information status, perceived prominence, pitch accents, and head gesture types. Results show that given information in French is multimodally less marked than new-er information and is accordingly perceived as less prominent. Our findings indicate that Catalan learners of French mark given information more frequently than native speakers and may transfer their use of low pitch accents to their second language (L2). The data also show that the use of head gestures depends on the presence of prosodic marking, calling into question the assumption that prosody and gesture have balanced functional roles. Finally, the type of head gesture does not appear to play a significant role in marking information status.

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press

1. Introduction

Natural languages signal important elements in speech through various cues, including word choice and order, prosody and gestures. For instance, new referents in discourse and constituents that the speaker wants to emphasise are key elements of a language’s information structure that listeners must recognise for successful communication. This can be achieved through prosody – the rhythm and intonation of speech – and co-speech gestures (see Kügler & Calhoun, Reference Kügler, Calhoun, Gussenhoven and Chen2020, for a review on prosody; Holler & Bavelas, Reference Holler, Bavelas, Church, Alibali and Kelly2017, for a review on gesture). However, the pragmatic functions of prosody and gesture have mostly been studied separately, even though they operate in an integrated manner (e.g., Krahmer & Swerts, Reference Krahmer and Swerts2009), and head gestures have received considerably less attention than manual gestures. In Romance languages such as Catalan and French, information structure has traditionally been associated with syntactic cues (e.g., Hamlaoui, Reference Hamlaoui2009; Lambrecht, Reference Lambrecht1994; Vallduví, Reference Vallduví1991); however, more recent research has highlighted prosody as an additional cue for signalling information structure in these languages (e.g., Féry, Reference Féry, Bourns and Myer2014; Forcadell, Reference Forcadell2016).

Moreover, further research is needed to understand how language learners use prosodic and gestural cues for information structure marking, a crucial pragmatic skill to acquire. Properly marking information structure helps convey the intended meaning and emphasis in a sentence, improving measures of intelligibility (e.g., Hahn, Reference Hahn2004), accentedness and comprehensibility (van Maastricht et al., Reference van Maastricht, Krahmer and Swerts2016a). Studies comparing speakers of typologically different languages have revealed clear prosodic contrasts between native and non-native speech (e.g., Catalan/Spanish and English; Mok et al., Reference Mok, van Maastricht and Esteve-Gibert2022; French and Dutch; Rasier et al., Reference Rasier, Caspers, van Heuven, Dziubalska-Kołaczyk, Wrembel and Kul2010). In the present study, we explore how native French speakers and Catalan learners of French signal information status through prosody and head gestures: While Catalan and French are typologically similar languages, their use of prosody to achieve linguistic and pragmatic functions differs, notably in signalling information status (e.g., Beyssade et al., Reference Beyssade, Hemforth, Marandin, Portes, Frazier and Gibson2015; Vanrell & Fernández-Soriano, Reference Vanrell and Fernández-Soriano2013).

1.1 Background

1.1.1 Information status

The information status of an expression describes its degree of givenness or cognitive activation in the discourse, and it is a central semantic-pragmatic notion. Studying its linguistic encoding at various levels of description provides insights into human communication in general, as well as into language-specific differences in discourse structuring (e.g., Prince, Reference Prince and Cole1981, Reference Prince, Thompson and Mann1992; Chafe, Reference Chafe1994; Lambrecht, Reference Lambrecht1994; Schwarzschild, Reference Schwarzschild1999). The information status of individual constituents falls under the broader notion on information structure. It is closely related to concepts such as the division of sentences and utterances into topic and focus, background and focus, or theme and rheme, which serve to differentiate the common ground (e.g., topic) and the most informative part of an utterance (e.g., focus).

Researchers have proposed distinguishing between two levels of information status (henceforth IS), namely a referential and a lexical (or conceptual) level. This distinction allows for differentiating the givenness of referring versus non-referring expressions (see annotation scheme RefLex; Riester & Baumann, Reference Riester and Baumann2017). While the syntactic domain of a referring expression is a noun phrase representing a discourse referent, the domain of a non-referring expression is an individual word (e.g., a noun or a verb). Thus, a given referring expression is coreferential with an antecedent, representing the same entity or referent. In contrast, a given non-referring expression is simply a repeated lexical item. A detailed description of the different IS subcategories at the referential and lexical levels applied in the present study is provided in Section 2.3.2.

At both levels, IS is signalled in speech through a combination of linguistic features, including syntactic and morphological constructions (Brunetti et al., Reference Brunetti, Bott, Costa and Vallduví2009; Féry & Ishihara, Reference Féry and Ishihara2014) as well as prosodic features (Büring, Reference Büring, Ramchand and Reiss2007; Kügler & Calhoun, Reference Kügler, Calhoun, Gussenhoven and Chen2020; Baumann & Riester, Reference Baumann and Riester2013). Recent research highlights the role of gestures in marking information structure (Holler & Bavelas, Reference Holler, Bavelas, Church, Alibali and Kelly2017). The challenges language learners face in mastering information structure marking, particularly in acquiring syntactic and morphological features, are well-documented (e.g., Dimroth & Starren, Reference Dimroth and Starren2003; Lozano & Callies, Reference Lozano, Callies, Malovrh and Benati2018). For example, learners may use overtly explicit speech; instead of using pronouns for given referents, they tend to repeat full lexical expressions and fail to adopt the attenuated reference forms typical of native speakers (e.g., Hendriks, Reference Hendriks and Ramat2003; Jung, Reference Jung2004). However, more research is needed on the combined role of prosody and gesture in IS marking in an L2, as well as the differences between first-language (L1) and L2 speakers.

1.1.2 Prosodic marking of information status

Research on a wide variety of languages, particularly West Germanic languages, has shown that information structure can be marked by adjusting an utterance’s metrical (i.e., rhythmic) structure (see Calhoun, Reference Calhoun2007, for English, Swerts et al., Reference Swerts, Krahmer and Avesani2002, for Dutch). The metrical structure links information structural categories to their corresponding levels of prosodic prominence. The element in the structurally strongest position within a phrase receives the nuclear pitch accent. It is the only obligatory accent in an intonation unit and typically occurs near its end (e.g., Calhoun, Reference Calhoun2010a; Cole, Reference Cole2015; Ladd, Reference Ladd2008). Structurally weaker positions, particularly those preceding the nuclear pitch accent and, to some extent, those following it, may also receive accents depending on the informativeness of the words carrying them (Calhoun, Reference Calhoun2010b). Nuclear prominence often indicates focussed (e.g., Jackendoff, Reference Jackendoff1972; Rooth, Reference Rooth1992; Selkirk, Reference Selkirk and Goldsmith1995) and/or new material (e.g., Halliday, Reference Halliday1967; Chafe, Reference Chafe and Li1976; Prince, Reference Prince and Cole1981), which is usually not the case with pre- and post-nuclear prominences.

A speaker’s choice of accent type may reflect both the informativeness of an element and the size of the focus domain it belongs to, which in turn influences perceived prosodic prominence (e.g., Dahan et al., Reference Dahan, Tanenhaus and Chambers2002; Watson et al., Reference Watson, Arnold and Tanenhaus2005; and, for a general overview, Wagner & Watson, Reference Wagner and Watson2010). Studies on German and English have shown that narrow and contrastive foci – where typically only one constituent is structurally highlighted (e.g., Did you talk to Tom? No, I talked to [Jane] contrastive focus) are marked by more prominent accent types. An ‘accent type hierarchy’ for German has been established in a controlled prominence rating study which found an increase in perceived prominence from no accents through falling pitch accents and high accents to rising pitch accents (Baumann & Röhr, Reference Baumann and Röhr2015). Crucially, German corpus and perception data (Baumann & Riester, Reference Baumann and Riester2013; Röhr & Baumann, Reference Röhr, Baumann, Lee and Zee2011) reveal distinct correspondence: newness is linked to prominent (i.e., high and rising) pitch accents, bridging relations are associated with less prominent (i.e., low and falling) accents, and givenness correlates with prenuclear accents or deaccentuation. It has often been shown that givenness may trigger deaccentuation in many languages, particularly in West Germanic (see Cruttenden, Reference Cruttenden and Bernini2006, for a comparison of 20 languages). However, even in English or German, there is no one-to-one relationship between givenness and the lack of pitch accent, especially in spontaneous speech (see Hirschberg, Reference Hirschberg1993; Baumann & Riester, Reference Baumann and Riester2013). Furthermore, syntactic cues, such as the same or a different grammatical role of a previously mentioned word, may also influence a word’s prosodic prominence level (Terken & Hirschberg, Reference Terken and Hirschberg1994).

Prosodic marking of information status in Catalan and French. Many Romance languages, including Catalan (Vallduví, Reference Vallduví1991) and French (Hamlaoui, Reference Hamlaoui2009; Lambrecht, Reference Lambrecht1994), encode focus and IS primarily through syntactic transformations such as clefting and dislocations (Klein, Reference Klein, Krifka and Musan2012; Vallduví, Reference Vallduví and Solà2008). In contrast, pitch accents generally indicate prosodic phrase structure (usually tied to grammatical units) with little evidence of deaccentuation (e.g., Ladd, Reference Ladd2008). In French, dislocations are common and usually matched by clitic pronouns in the core, as in the sentence La FÊTE DE DEMAIN, Claire l’organise (‘Claire is organising TOMORROW’S PARTY’). Other possible syntactic constructions are clefts – C’est CLAIRE qui organise la fête de demain (‘CLAIRE is organising tomorrow’s party’), and presentational constructions – Il y a une FÊTE demain (‘There is a PARTY tomorrow’). In declaratives, left dislocations and clefts are mostly characterised by a rising intonation contour (Di Cristo, Reference Di Cristo, Hirst and Di Cristo1998; Mertens, Reference Mertens2008). Nevertheless, independent of information structure, accent placement in French is believed to be constrained by prosodic phrasing and to signal phrase structure at the accentual phrase (AP) level, defined as the minimal prosodic domain. The AP may correspond to a word, phrase or clause and is realised by an optional initial high accent and a compulsory final accent, which is generally high when non-final and low in utterance-final nuclear position (Delais-Roussarie et al., Reference Delais-Roussarie, Frota and Prieto2015; Jun & Fougeron, Reference Jun, Fougeron and Botinis2000). Both initial and final accents have been claimed to contribute to focus marking when they feature steeper contours (German & D’imperio, Reference German and D’Imperio2016; Michelas & German, Reference Michelas and German2020). Di Cristo (Reference Di Cristo2000) proposed that pitch range compression between an initial high accent and a high nuclear pitch accent (an ‘accentual arc’) can be a marker of focus. This configuration involves at least two APs whereby the initial rise in the first AP is enhanced and is followed by lower pitch range and a low pitch accent, while the high accent in the second AP is also enhanced (Portes & Reyle, Reference Portes and Reyle2022). Furthermore, there is some evidence of an equivalent to deaccentuation on given information signalled by post-focal tonal compression (i.e., reduction in pitch range, Delais-Roussarie et al., Reference Delais-Roussarie, Rialland, Doetjes, Marandin, Bel and Marlien2002; Féry, Reference Féry, Féry and Sternefeld2001, Reference Féry, Bourns and Myer2014; Jun & Fougeron, Reference Jun, Fougeron and Botinis2000) and increased speech rate (Portes & Reyle, Reference Portes and Reyle2022). In such cases, some authors have claimed that the nuclear pitch accent (H*) preceding the deaccented elements of the sentence may mark new or focussed information (e.g., Beyssade et al., Reference Beyssade, Hemforth, Marandin, Portes, Frazier and Gibson2015; Dohen & Loevenbruck, Reference Dohen and Loevenbruck2004; Rasier et al., Reference Rasier, Caspers, van Heuven, Dziubalska-Kołaczyk, Wrembel and Kul2010) or that a separate AP is created with the focussed constituent receiving the nuclear pitch accent (Féry, Reference Féry, Féry and Sternefeld2001).

In Catalan, broad focus statements – where all information is new – are characterised by high prenuclear accents on lexical constituents and low nuclear pitch accents at the end of the intonational phrase (Forcadell, Reference Forcadell2016; Prieto, Reference Prieto and Jun2014; Vanrell & Fernández-Soriano, Reference Vanrell and Fernández-Soriano2013). Prenuclear accents are the correlates of lexical stress and its rule (i.e., a lexical property of a word that specifies which syllable will be more prominent than the others), which does not exist in French. Following Vallduví (Reference Vallduví1993, Reference Vallduví and Solà2008), syntax is frequently used to express the partition between old/shared and new/emphasised information by right dislocating the elements of the sentence containing the new/emphasised piece of information. In this case, the element carrying new information is in the final sentence position and receives a low nuclear pitch accent. For instance, in the sentence La festa de demà, l’organitza la MARIA (‘MARIA is organising tomorrow’s party’), the new information consists of knowing who is organising the party. A recent quantitative study confirmed that accessible information in broad focus as well as contrastive and corrective focus are often marked with a low pitch accent and found no evidence of the use of more prominent pitch accent types (high or rising accents) with accessible and focussed constituents in Catalan (Gregori et al., Reference Gregori, Sánchez-Ramón, Prieto, Kügler, Chen, Chen and Arvaniti2024).

Prosodic marking of information status in a foreign language Footnote 1. Ample evidence suggests that language learners frequently transfer prosodic patterns from their native language to their foreign speech, particularly in intonation and accentuation (e.g., Mennen, Reference Mennen, Trouvain and Gut2007; Ortega-Llebaria & Colantoni, Reference Ortega-Llebaria and Colantoni2014, among many others). Several studies have highlighted the challenges language learners face in prosodically expressing focus (e.g., contrastive focus) in a foreign language, particularly in L2 English (e.g., Fujimori et al., Reference Fujimori, Yamane, Yoshimura, Nakayama, Teaman, Yoneyama, Leal, Shimanskaya and Isabelli2022; O’Brien & Gut, Reference O’Brien, Gut, Wrembel, Kul and Dziubalska-Kolaczyk2011; Ramírez-Verdugo, Reference Ramírez-Verdugo2006; Yan & Calhoun, Reference Yan and Calhoun2022). Similar findings have been reported for Spanish (Sánchez Alvarado & Armstrong, Reference Sánchez Alvarado and Armstrong2022; Van Maastricht et al., Reference van Maastricht, Krahmer and Swerts2016b), Persian (Hosseini, Reference Hosseini2013) and Dutch (Van Maastricht et al., Reference van Maastricht, Krahmer and Swerts2016b). Learners of English from various native language backgrounds often struggle to distinguish between given and new words in terms of accent placement, pitch accent type and pitch movement amplitude (Gut & Pillai, Reference Gut and Pillai2014; Ramírez-Verdugo, Reference Ramírez-Verdugo2002; Wennerstrom, Reference Wennerstrom1998). Furthermore, Rasier et al. (Reference Rasier, Caspers, van Heuven, Dziubalska-Kołaczyk, Wrembel and Kul2010) found that French learners of Dutch – a language with strong prosodic marking of IS – fail to deaccent given information as extensively as native Dutch speakers.

Based on previous studies, we expect Catalan speakers to face similar challenges in prosodically marking IS in a L2. This is due not only to the syntactic strategies discussed above but also to the fact that lexical stress in Catalan is maintained in unaccented contexts. As a result, word prominence may perceptually compete with phrasal accents (Ortega-Llebaria & Prieto, Reference Ortega-Llebaria and Prieto2011).

1.1.3 Gestural marking of information status

Many studies have shown that manual and non-manual gestures (head nods and eyebrow movements) are temporally aligned with speech to fulfil a variety of functions, including the marking of both focus and IS (e.g., Ambrazaitis & House, Reference Ambrazaitis and House2017; Loehr, Reference Loehr2012; McNeill, Reference McNeill1992; Jannedy & Mendoza-Denton, Reference Jannedy and Mendoza-Denton2005; Kendon, Reference Kendon2004; McClave, Reference McClave2000). More specifically, there is ample evidence of the temporal association between prosodic prominence (stressed or accented syllable) and the prominence (strokes or apexes) of referential manual gestures, which convey meaning, as well as non-referential gestures (e.g., Esteve-Gibert & Prieto, Reference Esteve-Gibert and Prieto2013; Karpiński et al., Reference Karpiński, Jarmołowicz-Nowikow and Malisz2009; Leonard & Cummins, Reference Leonard and Cummins2011; Loehr, Reference Loehr2012; Rohrer, Reference Rohrer2022; Shattuck-Hufnagel & Ren, Reference Shattuck-Hufnagel and Ren2018; Türk, Reference Türk2020; Yasinnik et al., Reference Yasinnik, Renwick and Shattuck-Hufnagel2004). Regarding non-manual gestures, several studies have shown a tight temporal alignment of eyebrow raises and head gestures with speech (Alexanderson et al., Reference Alexanderson, House and Beskow2013; Ambrazaitis & House, Reference Ambrazaitis and House2017; Esteve-Gibert et al., Reference Esteve-Gibert, Borràs-Comes, Asor, Swets and Prieto2017; Flecha-García, Reference Flecha-García2010; Swerts & Krahmer, Reference Swerts and Krahmer2010). For instance, Swerts and Krahmer (Reference Swerts and Krahmer2010) investigated both head and eyebrow movements in a Dutch news reading corpus: strongly prominent words tended to co-occur with both head and eyebrow movements (67%), while weakly prominent words were mostly produced without gestures (47%), with only head movement (16%), or only eyebrow movement. Interestingly, the production of head gestures affects not only how prominence is perceived but also the prominence of speech itself (Krahmer & Swerts, Reference Krahmer and Swerts2007).

While there is a large body of empirical evidence showing that referential manual gestures in speech tend to mark the introduction of new referents (e.g., Debreslioska et al., Reference Debreslioska, Özyürek, Gullberg and Perniss2013; Ebert et al., Reference Ebert, Evert, Wilmes, Reich, Horch and Pauly2011; Gullberg, Reference Gullberg, Dimroth and Starren2003; Levy & Fowler, Reference Levy, Fowler and McNeill2000; Levy & McNeill, Reference Levy and McNeill1992; Marslen-Wilson et al., Reference Marslen-Wilson, Levy, Komisarjevsky Tyler, Jarvella and Klein1982; Yoshioka, Reference Yoshioka2008) and accessible referents (Debreslioska & Gullberg, Reference Debreslioska and Gullberg2020; Debreslioska & Gullberg, Reference Debreslioska and Gullberg2022; Rohrer, Reference Rohrer2022), only a few studies have examined the combined effect of prosody and gestures, which we call multimodal marking. Türk (Reference Türk2020) investigated the multimodal marking of information structure in Turkish and found that both referential and non-referential manual gestures were systematically synchronised with pitch accents and tended to be paired with foci and contrastive elements. Looking at the role of head gestures in information structure marking, Esteve-Gibert et al. (Reference Esteve-Gibert, Loevenbruck, Dohen and D’Imperio2021) found that French children consistently use head gestures rather than prosody to mark contrastive focus while also observing increased duration and pitch range on the words marked by head gestures. Along the same lines, Gregori et al. (Reference Gregori, Sánchez-Ramón, Prieto, Kügler, Chen, Chen and Arvaniti2024) found that the presence of head gestures, but not hand gestures or eyebrow movements, correlates with the size of the focus domain in Catalan and German; that is, the smaller (and more contrastive) the focus domain, the more gestures were produced. Rohrer (Reference Rohrer2022) analysed the multimodal marking of IS in an American English TED Talk and found that new and accessible referents were significantly more frequently marked by both manual gestures and prosody, while given referents received either only prosodic or no marking. In addition, new and accessible referents were perceived as significantly more prominent than given referents. Im and Baumann (Reference Im and Baumann2020) suggested a direct mapping between pitch accent type and IS marking in American English TED Talks. They found that new information was most frequently marked by pitch accents alone, followed by pitch accents co-occurring with manual gestures, and that the type of pitch accent was a modulator of gestural marking. Less prominent pitch accents (low and downstepped accents) tended to occur more often in words that were not marked by a gesture, whereas more prominent pitch accents (high and rising accents) were more often accompanied by a gesture. Regarding the gestural marking of IS, the results suggest that words that introduce new and unique entities or whose information status has to be derived from previous discourse (see examples and respective RefLex labels in Section 2.3.2) are strong predictors of the occurrence of non-referential gestures.

Gestural marking of information status in a foreign language. Studies on the gestural marking of IS in a foreign language have primarily focussed on demonstrating that the overtly explicit marking of referents in speech has a counterpart in gesture. Gullberg (Reference Gullberg, Dimroth and Starren2003, Reference Gullberg2006) found that Swedish learners of French and French learners of Swedish with low proficiency repeatedly used nominal expressions instead of pronouns when referring to previously mentioned entities. They frequently accompanied them with manual gestures, placing the referents in different spatial locations. While Gullberg’s results (Gullberg, Reference Gullberg, Dimroth and Starren2003, Gullberg, Reference Gullberg2006) are linked to the acquisition of languages with complex pronominal forms, Yoshioka (Reference Yoshioka2008) investigated whether this overtly explicit use of manual gestures might be related to general difficulties in structuring information. She found that Dutch learners of Japanese, a language that does not actively use pronouns, overtly specified referents in speech and gestures. However, this pattern may not persist in highly proficient speakers (Azar et al., Reference Azar, Backus, Ozyurek, Gunzelmann, Howes, Tenbrink and Davelaar2017). So et al. (Reference So, Kita and Goldin-Meadow2013) compared proficient and less proficient Mandarin learners of English. They found that proficient L2 English speakers used gestures more frequently when referents were not lexically specified, a pattern similar to that observed by Yoshioka (Reference Yoshioka2008) in Japanese native speakers. In contrast, less proficient L2 English speakers produced more explicit nominal referential expressions instead of pronouns and accompanied them with gestures. Mok et al. (Reference Mok, van Maastricht and Esteve-Gibert2022) investigated the multimodal marking of focus by Catalan learners of English and found that prosodic cues and head gestures were systematically coupled but did not reflect focus marking. That is, no distinction was made between different types of focus (broad, contrastive and corrective), and participants overemphasised unexpected parts of the utterance. Overall, for non-native speakers, distinguishing new from given information might involve more gestural marking, even if it is not done correctly.

1.2 The present study

Overall, there is limited evidence on how both native speakers of a Romance language and learners whose native language is also Romance use prosodic and gestural prominence to indicate IS. This exploratory study aims to assess the role of prosody and head gestures in marking IS in French, both as a native and foreign language. To achieve this, we annotated and analysed an audiovisual corpus consisting of 19 native French speakers and 25 Catalan learners of French. The analysis focused on IS categories, perceived prominence, the presence and type of pitch accents and the presence and type of head gestures to address the following research questions:

  1. 1. Do native French speakers and Catalan learners of French use prosody, head gestures or a combination of both to mark information status?

For this research question, our predictions based on the reviewed literature are as follows:

  1. a. Native French speakers:

  • - At the referential level, pronouns (always labelled as given referents) will not be marked prosodically, but coreferential (i.e., r-given) expressions, in their full or pronominal form, may be pitch accented if they occur at the end of an AP.

  • - At the lexical level, clear cases of total deaccentuation (as is common in West Germanic languages) will be rare.

  • - Regarding gestural marking, gesture and prosody are considered an integrated system and focus marking may trigger an increase in head gesture frequency (Section 1.1.3). Therefore, we expect L1 French speakers to produce head gestures together with pitch accents on marked referents, and more head gestures should co-occur with new-er information.

  1. b. Catalan learners of French:

  • - At the referential level, we might observe more marked given referents than in L1 French, both with prosody and gesture.

  • - At the lexical level, we expect little distinction between given and new information in terms of prosodic marking if they use the same syntactic strategies as in their L1: Pitch accents will occur both in dislocated and main clauses for prosodic phrasing. In addition, there may be a high number of pitch accents stemming from lexical stress.

  • - Regarding gestural marking, Catalan learners of French may overexplicitly mark given information at the referential level.

  1. 2. If information status is marked by prosody and head gestures, how does a word’s information status relate to its perceived level of prominence, pitch accent type and head gesture type?

For this research question, our predictions based on the reviewed literature are as follows:

  1. a. Native French speakers:

  • - Prosodic prominence: A stronger degree of prominence is expected with increased newness.

  • - Pitch accent type: High and rising pitch accents may be associated with new-er information, while low and falling pitch accents may express more given information. Nevertheless, this relationship may not be as clear-cut as in West Germanic languages. We expect tonal compression as a general strategy for attenuating given or backgrounded information rather than using a specific type of accent.

  1. b. Catalan learners of French:

  • - Prosodic prominence: A stronger degree of prominence is expected with increased newness.

  • - Pitch accent type: We may find higher proportions of low pitch accents, given their association with broad focus statements and dislocated new/focussed elements.

Regarding head gesture types, our study will explore their contribution to IS and prominence, as previous studies have only mentioned nods (e.g., Swerts & Krahmer, Reference Swerts and Krahmer2010) or head beats (e.g., Ambrazaitis & House, Reference Ambrazaitis and House2017) in marking focussed information, without addressing the relative importance of different head gesture types (e.g., Esteve-Gibert et al., Reference Esteve-Gibert, Borràs-Comes, Asor, Swets and Prieto2017; Mok et al., Reference Mok, van Maastricht and Esteve-Gibert2022).

2. Methods

2.1 Participants

The participants included 19 native French speakers (M age = 34.3 years, SD = 10.1) and 25 bilingual Catalan Spanish speakers (M age = 19.6, SD = 0.91) who were learning French as a second foreign language. The French speakers were recruited from fifteen different areas across France and had no prior exposure to the Catalan language. Nine were graduate or undergraduate students, while ten worked in the private sector. The Catalan-Spanish bilingual learners of French were all second-year undergraduate students in Translation and Interpretation Studies at Pompeu Fabra University, Barcelona. English was their first foreign language, followed by French as their second. Both languages were part of their university coursework. They self-reported holding official French diplomas at levels A2 (20%), B1 (44%) or B2 (36%), with B2 as the university’s instructional level. The oral narrative task used for data collection was integrated into their curriculum as two homework assignments for their French course, spaced four weeks apart. The two topics selected for the narrative task were discussed in French seminars and were appropriate for their proficiency level. Thus, we expected participants to produce speech sufficient for reliable data collection. The study was approved by the Department of Translation and Language Sciences as well as the instructors of the French course. All participants signed a consent form granting permission for their auditory and visual performances to be recorded and analysed.

2.2 Elicitation task

The participants were instructed to record a short spontaneous monologue on video. Both native French speakers and Catalan learners of French were given specific guidelines to ensure comparable oral productions:

Explain in a few sentences in French who your best friend is. How is he/she physically? How did you meet? What do you like and dislike about him/her?

Additionally, the Catalan learners of French were asked to discuss their Erasmus experience (i.e., a three-month study abroad programme at a partner university) from the previous year:

Please explain your Erasmus stay in French in a few sentences. Where did you go, and when? Did everything go smoothly? What are your best memories? Would you like to live in a foreign country again?

Participants received a link from the experimenter directing them to an online survey platform (https://www.alchemer.com) where they could read the recording instructions and task guidelines. No specific instructions were provided regarding head or body movement. They were asked to use their laptop camera and record their response using webcamera.io (https://webcamera.io/, quality 720p). The video files were then submitted via a shared Google Drive folder accessible to the experimenter (and for the Catalan learners, also their teachers).

2.3 Data treatment and annotations

We obtained 50 video recordings for the Catalan learners of French (two narratives each by 25 participants, Total phonation time = 38.4 min, M phonation time = 46 s, SD = 19 s) and 19 videos for the native French speakers (Total phonation time = 19.46 min, Mphonation time = 58 s, SD = 19 s). For each file, auditory information was extracted as sound files (WAV 48KHz 16-bit) using Adobe Premiere Pro. The sound files were annotated automatically in Praat (Boersma & Weenink, Reference Boersma and Weenink2022) for words, syllables and phonemes using Easy Align (Goldman, Reference Goldman2011), and inexact boundaries were manually corrected by the first author.

2.3.1 Perceived prominence and prosodic annotations

Pitch accents and prominence annotations were independently performed by the first author, a native French speaker. First, prominence was perceptually annotated in Praat following the DIMA annotation system guidelines (“Deutsche Intonation – Modellierung und Annotation”, Kügler et al., Reference Kügler, Baumann, Röhr, Schwarze and Grawunder2022). Words were assigned weak (level 1), strong (level 2) or extra-strong (level 3) prominence labels based on their perceived salience in the utterance relative to neighbouring words. Words not labelled as prominent were later automatically assigned a prominence score of 0.

For phrasing, we adopted the F_ToBI annotation guidelines (“French_tones and break indices”, Delais-Roussarie et al., Reference Delais-Roussarie, Frota and Prieto2015), structuring utterances from smaller to larger rhythmic units: (i) APs (minimally consisting of a lexical word and its function words), (ii) intermediate phrases (potentially grouping several APs) and (iii) prosodic phrases (typically signalling the end of an utterance). Pitch accents were then annotated, with each AP containing one optional prenuclear initial accent (Hi) and one obligatory nuclear pitch accent. Pitch accents consisted of high (H) or low (L) pitch patterns or combinations of these (H and L), indicating relative highs and lows in the intonation contour. Their phonetic realisation varied with respect to their pitch range and the shape of the preceding pitch accents in the phrase. Given that L2 speech often deviates from the prosodic patterns predicted by ToBI phonological annotations, we followed a more phonetic annotation approach, as advocated by Hualde and Prieto (Reference Hualde and Prieto2016). In our data, we annotated the following pitch accents: L*, HL*, LH,!H, H*, HL, LH and H*H. Filled pauses longer than 400 ms and false starts – common in spontaneous speech and potential disruptors of prominence perception – were also annotated and later excluded from the analysis.

To assess inter-annotator reliability, a second annotator, an English-French bilingual with expertise in prosodic analysis, independently annotated prominence and pitch accents on 20% of the data. For prominence annotations, substantial Cohen’s Kappa scores were obtained for both groups: .71 for the Catalan learners of French (z = 29.9, p < .001, CI [.67, .75]) and .85 for the native French speakers (z = 34.6, p < .001, CI [.76, .83]). As to pitch accent annotations, a substantial Cohen’s Kappa score of .61 (z = 30.2, p < .001, CI [.50, .60]) was obtained for the Catalan learners of French and .67 (z = 27.8, p < .001, CI [.61, .72]) for the native French speakers. Annotations performed by the first author are retained in the analysis. However, due to the greatest divergence occurring in Hi pitch accent annotations, these were consensually reviewed and revised by both annotators.

2.3.2 Information status annotations

For IS annotation in this study, we applied the RefLex system (Riester & Baumann, Reference Riester and Baumann2017, which analyses expressions at both referential and lexical levels. To the best of our knowledge, this is the first application of RefLex to French. We used a simplified inventory of the system, previously applied to American English by Im and Baumann (Reference Im and Baumann2020). Table 1 presents the labels selected for our dataset, along with their descriptions and corresponding English examples (Riester & Baumann, Reference Riester and Baumann2017; Riester & Baumann, Reference Riester and Baumann2013; Rooth, Reference Rooth1992). The labels are ordered according to their degree of givenness, from given to new.

Table 1. RefLex annotation labels used in this study. Words and phrases in boldface are examples of the respective labels

Participants’ narrative speech transcriptions were exported from Praat and imported into a spreadsheet (one word by line), together with each participant’s reference number and each story’s reference number. Words may receive a label at the r-level (if it is part of a referring expression), at the l-level (if it is a content word), at both levels, or remain unannotated. Annotations were performed by the second author.

2.3.3 Head gesture annotations

The video recordings were coded in ELAN (version 6.3) for non-referential head gestures – head movements accompanying speech that did not refer to or represent a specific object, action or concept. Head movements conveying semantic meaning, such as rapid head shaking for negation or repeated up-and-down movements for approval, were excluded. We applied the M3D annotation scheme, which classifies head gestures into five types (“Multimodal Multidimensional gesture labeling”, Rohrer et al., Reference Rohrer, Tütüncübasi, Vilà-Giménez, Florit-Pons, Esteve-Gibert, Ren-Mitchell, Shattuck-Hufnagel and Prieto2023, p. 23; see Wagner et al., Reference Wagner, Malisz and Kopp2014, p. 212, for an illustration).

  • - Nod: Upward and downward movements of the head

  • - Turn: movements of the head

  • - Tilt: The top of the head moves in one direction (left or right), and the chin moves in the opposite direction

  • - Slide: The entire head shifts laterally from left to right

  • - Protrusion: The entire head moves forward or backward

After observing our corpus, we identified a small number of upward nods and retractions (backward protrusions). Following the methodological approach of Ludusan et al. (Reference Ludusan, Schröer, Rossi and Wagner2023), we merged these into their respective categories, nods and protrusion.

Gesture annotation was performed by the first author. First, gesture strokes were identified and annotated by locating the onset and offset of the most salient kinematic movement of the head. This was done though frame-by-frame observation (30 ms) without audio, to prevent speech influence. The non-obligatory preparation and retraction preceding and following the stroke were not considered. Gesture type classification was based on movement observed at normal velocity. When a stroke contained more than one gesture type (e.g., nod + protrusion or tilt + turn), the perceptually most salient type was selected. In a second step, the apex, which refers to the most pronounced point of movement, was identified. In non-referential head gestures, the apex typically occurred at the end of the stroke, just before the head returned to its neutral or starting position. If the head remained in a holding position before resuming movement, the apex was marked at the end of the stroke, just before the hold phase.

To assess inter-annotator reliability, 20% of the data was independently annotated by a fellow linguist specializing in gesture studies. Gesture stroke reliability was evaluated using Staccato (Lücking et al., Reference Lücking, Ptock, Bergmann, Efthimiou, Kouroupetroglou and Fotinea2012), an ELAN-integrated tool that measures temporal overlap between annotations based on Thomann’s technique (Thomann, Reference Thomann2001). For the Catalan learners of French, Staccato produced a mean degree of overlap of 73.9%. Gesture-type reliability was assessed using Cohen’s kappa, yielding a score of .73 (z = 19.2, p < .001, CI [0.67, 0.80]), indicating substantial agreement. For the French speakers, the degree of overlap was 79%, with a Cohen’s kappa score of .79 (z = 18.6, p < .001, CI [0.73, 0.86]). Annotations performed by the first author were retained for analysis.

2.4 Analyses

The transcription of words, along with prominence and prosodic annotations performed in Praat, was imported into ELAN (see example in Figure 1), enabling the creation of a time-aligned database for further processing in R (version 4.1.2, 2021). IS annotations were imported via R into the final databases. To enhance readability in the results, pitch accents were grouped into five categories based on their shape: low (L*), falling (H*L, HL*), high (H*) and rising (L*H, LH*, HH*) plus the Hi.

Figure 1. ELAN screenshot with the annotation layers for text (word, syllable), head gesture (type, apex), prosody (phrase ends, pitch accents and boundary tones) and perceived prominence levels.

To address the first research question, namely whether native French speakers and Catalan learners of French mark IS with pitch accents, head gestures or both, we examined the temporal overlap between (i) words labelled at the r-level and/or l-level and the F0 turning point of a pitch accent and (ii) words labelled at these levels and the apex of a head gesture. We computed counts and frequencies for each IS category. Data count and frequencies regarding the type of marking were obtained for each IS category. For statistical analyses, as recommended by Winter (Reference Winter2020) for count data, two generalized linear mixed-effects models (GLMERs) with Poisson regression and a random intercept for participants were implemented in R (R Core Team, 2022) using the lme4 package (Bates et al., Reference Bates, Mächler, Bolker and Walker2015). To account for between-category differences due to unequal occurrences, the models were offset by the total number of items in each category. Likelihood ratio tests were used to assess the significance of fixed effects. For the r-level, the dependent variable was the number of referential expressions, and the fixed factors were PROFICIENCY (L1 speakers, L2 speakers), IS REFERENTIAL LEVEL (four levels: r-given, r-bridging, r-unused and r-new) and IS MARKING (four levels: no marking (i.e., neither head gesture nor pitch accent), head gesture only, pitch accent only and pitch accent + head gesture) as well as their interactions. For the l-level, the dependent variable was the number of lexical expressions, and the fixed factors were PROFICIENCY (L1 speakers, L2 speakers), IS LEXICAL LEVEL (three levels: l-given, l-accessible and l-new) and IS MARKING (four levels: no marking, head gesture only, pitch accent only and pitch accent + head gesture) as well as their interactions. Significant effects were assessed via omnibus test results, followed by a series of Benjamini-Hochberg’s false discovery rate (FDR) pairwise tests conducted with the emmeans package (Lenth, Reference Lenth2021).

Regarding the second research question, to examine whether IS relates hierarchically to perceived prominence, two linear mixed-effects models (LMERs) were run with mean prominence as the dependent variable and IS REFERENTIAL LEVEL (four levels: r-given, r-bridging, r-unused and r-new) and PROFICIENCY (L1 speakers, L2 speakers) or IS LEXICAL LEVEL (three levels: l-given, l-accessible and l-new) and PROFICIENCY (L1 speakers, L2 speakers) as fixed factors. For pitch accent types and head gesture types, two GLMERs with Poisson regression were applied at both IS levels (referential, lexical). The dependent variable was the number of occurrences at each level and, depending on the model, PITCH ACCENT TYPE (five levels: Hi, low, falling, high, rising) and PROFICIENCY (L1 speakers, L2 speakers) or HEAD GESTURE TYPE (five levels: nod, protrusion, slide, tilt, turn) and PROFICIENCY (L1 speakers, L2 speakers) as fixed factors. Interactions were then assessed via omnibus test results using a series of Benjamini-Hochberg’s FDR pairwise tests. Regarding the report of the post hoc comparisons in the results of the second research question, we chose to ignore the differences related to no pitch accent or no head gesture, since we are interested in finding the relationship between the actual pitch accent and head gesture types and IS categories. Nevertheless, the complete dataset, R code and exhaustive statistical results of this study are available at https://osf.io/7d8yh.

3. Results

In this section, we report the results of the descriptive and inferential statistics regarding the marking of information status with pitch accents and/or gestures and the concomitant relationship between (prosodic) prominence, pitch accent types and head gesture types.

3.1 Research Question 1: Do native French speakers and Catalan learners of French mark information status using prosody, head gestures or both in combination?

3.1.1 Referential level

Table 2 shows the frequencies of expressions marked at the referential level and the total number of pitch accents and head gestures that were produced within the boundaries of an r-level expression, that is, when there was a temporal overlap between the word annotated as a referent and a pitch accent or a head gesture. Figure 2 shows the proportion of different cues for each r-level category (see Appendix A1 with the number of occurrences in each category).

Table 2. Number of expressions annotated at the referential level in each category and total number of pitch accents and head gestures marking these expressions

Figure 2. Marking of information status (IS) at the referential level.

At the r-level, the results of the GLMER show significant main effects of IS MARKING (χ2(3) = 313.73, p < .0001) and IS REFERENTIAL LEVEL (χ2(3) = 42.82, p < .0001), as well as significant two-way interactions between PROFICIENCY and IS MARKING (χ2 (3) = 35.34, p < .0001) and IS REFERENTIAL LEVEL and IS MARKING (χ2 (9) = 406.83, p < .0001). Furthermore, the analysis reveals a significant three-way interaction between PROFICIENCY, IS REFERENTIAL LEVEL and IS MARKING (χ2 (9) = 29.29, p < .001).

Native French speakers . We find a clear tendency for given information to be less prominently encoded than new-er information: Compared to r-bridging, r-unused and r-new expressions, r-given expressions are significantly more often unmarked (z = 3.22, p < .01; z = 7.74, p < .0001; z = 7.83, p < .0001), significantly less often marked with pitch accent only (z = 7.10, p < .0001; z = 4.55, p < .0001; z = 6.10, p < .0001) as well as with pitch accent + head gesture (z = 6.27, p < .0001; z = 7.20, p < .0001; z = 6.85, p < .0001). Furthermore, within the r-given category, we find significantly more unmarked expressions than referential expressions marked with head gesture only, pitch accents only or pitch accent + head gesture (z = 11.70, p < .0001; z = 13.25, p < .0001, z = 13.59, p < .0001). Conversely, there are significantly more referential expressions with pitch accent only or pitch accent + head gesture than unmarked expressions in the r-unused (z = 3.69, p < .001; z = 4.60, p < .0001) and r-new categories (z = 4.02, p < .001; z = 3.80, p < .05) and more referential expressions with pitch accent only than unmarked expressions in the r-bridging category (z = 2.42, p < .05).

We do not find any significant differences between pitch accent only and pitch accent + head gesture . However, the use of head gesture only is significantly less frequent than both of them with r-unused (z = 2.41, p < .05; z = 2.76, p < .05) and r-new expressions (z = 3.32, p < .01; z = 3.23, p < .01) and also compared to pitch accent only with r-given expressions (z = 3.11, p < .01).

R-bridging expressions trigger more marking with pitch accent only compared to r-unused and r-new expressions (z = 3.27, p < .01; z = 2.25, p < .0504, near-significant).

Catalan learners of French. Similar to the native French speakers, given information is less prominently encoded than new-er information: R-given expressions are more often unmarked compared to r- bridging, r-unused and r-new expressions (z = 2.44, p < .05; z = 7.06, p < .0001; z = 5.18, p < .0001). In addition, r-given expressions are marked significantly less often with pitch accent only or pitch accent + head gesture compared to r-bridging (z = 4.24, p < .0001; z = 6.67, p < .0001), r-unused (z = 5.40, p < .0001; z = 9.41, p < .0001) and r-new expressions (z = 2.95, p < .01; z = 8.74, p < .0001). Within the r-given category, there are more unmarked expressions than expressions marked with head gesture only, pitch accents only or pitch accent + head gesture (z = 10.40, p < .0001; z = 5.68, p < .0001, z = 11.58, p < .0001) and there are more expressions marked with pitch accent only than with pitch accent + head gesture (z = 7.58, p < .0001). Conversely, there are significantly more referential expressions with pitch accent only or pitch accent + head gesture than unmarked expressions within the r-unused (z = 7.09, p < .0001; z = 5.85, p < .0001) and r-new categories (z = 3.48, p < .01; z = 3.22, p < .01) and more referential expressions with pitch accent only than unmarked expressions in the r-bridging category (z = 2.79, p < .05).

New information is less consistently marked than accessible information: R-new expressions are more frequently unmarked than r-unused expressions (z = 2.47, p < .05).

The use of head gesture only is rare: We find significantly less expressions with head gesture only than expressions with pitch accent only and pitch accent + head gesture in the r-given (z = 8.18, p < .0001; z = 3.42, p < .05), r-unused (z = 4.77, p < .001; z = 4.21, p < .05) and r-new (z = 5.31, p < .0001; z = 5.18, p < .0001) categories.

Comparing native French speakers and Catalan learners of French . We find significantly more unmarked r-given expressions in the native French speakers (z = 3.13, p < .01) and more r-given expressions marked with pitch accent only in the Catalan learners of French (z = 4.71, p < .0001).

3.1.2 Lexical level

Table 3 shows the frequencies of expressions marked at the lexical level and the total number of pitch accents and head gestures produced within l-level constituents. Figure 3 shows the proportions of cues for each l-level category (see Appendix A2 for the number of occurrences in each category).

Table 3. Number of expressions annotated at the lexical level in each category and number of pitch accents and head gestures marking these expressions

Figure 3. Marking of information status (IS) at the lexical level.

At the l-level, the results of the GLMER show significant main effects of PROFICIENCY (χ2(1) = 6.66, p < .01), IS MARKING (χ2(3) = 867.54, p < .0001) and IS LEXICAL LEVEL (χ2(2) = 68.42, p < .0001), as well as significant two-way interactions between PROFICIENCY and IS MARKING (χ2 (3) = 31.95, p < .0001) and IS LEXICAL LEVEL and IS MARKING (χ2 (6) = 49.51, p < .0001). However, the three-way interaction was not statistically significant.

Considering French L1 and L2 speakers together, there is evidence that new-er expressions are more marked than given expressions: L-given words are significantly more often unmarked than l-accessible and l-new words (z = 3.52, p < .01; z = 4.99, p < .0001). Within the l-new category, there is a significantly more frequent use of pitch accent only and pitch accent + head gesture than unmarked items (z = 15.92, p < .0001; z = 14.67, p < .0001).

Nevertheless, given information still shows strong marking: There are more l-given expressions marked with pitch accent only than unmarked ones (z = 2.41, p < .05).

We find that head gestures are rarely used without a pitch accent: There are significantly fewer expressions with head gesture only than expressions with pitch accent only and pitch accent + head gesture in the l-new category (z = 17.69, p < .0001; z = 17.07, p < .0001) and expressions with pitch accent only in the l-given category (z = 2.26, p < .05)Footnote 2.

The association between pitch accent and gesture seems stronger for accessible words: There are, in general, more l-accessible words marked with pitch accent + head gesture compared to l-new words (z = 5.54, p < .0001), while l-new words are significantly more often marked with pitch accent only compared to l-accessible words (z = 3.32, p < .01).

3.2 Research Question 2: Is there a relationship between a word’s information status and its perceived level of prominence, the type of pitch accent and the type of head gesture used by the participants?

3.2.1 Perceived prominence

The relative prominence of each information category at the r- and l-levels for native French speakers and Catalan learners of French is shown in Table 4.

Table 4. Mean perceived prominence and standard deviation for the information status categories at the r- and l-levels

Note: The scores for the prominence ratings ranged from 0 to 3.

R-level. The results of the LMER show a significant effect of PROFICIENCY (χ2(1) = 5.70, p < .05) and IS REFERENTIAL LEVEL (χ2(3) = 737.80, p < .0001), as well as a significant interaction between PROFICIENCY and IS REFERENTIAL LEVEL (χ2(3) = 42.99, p < .0001).

Given expressions are less prominent than new-er expressions: R-given expressions are rated as significantly less prominent than r-bridging, r-unused and r-new expressions, for both the native French speakers (t = 12.48, p < .0001; t = 16.88, p < .0001; t = 15.60, p < .0001) and the Catalan learners of French (t = 8.82, p < .0001; t = 15.34, p < .0001, t = 10.15, p < .0001). In addition, for the L2 speakers, r-unused expressions are rated as significantly more prominent than r-new expressions (t = 4.19, p < .001).

Given expressions are more prominent in L2 speakers: We find a significantly higher prominence score in the r-given category for the Catalan learners of French compared to the native French speakers (t = 5.51, p < .0001).

L-level. The result of the LMER shows significant effects of PROFICIENCY (χ2(1) = 5.11, p < .05) and IS LEXICAL LEVEL (χ2(2) = 19.12, p < .0001). In general, l-given expressions are rated as significantly less prominent than l-new expressions (t = 4.27, p < .001). However, we did not find an interaction between native speakers and Catalan learners of French on the one hand and information status marking at the lexical level on the other.

3.2.2 Pitch accent types

Figure 4 shows the proportions of pitch accent types in each category at the referential level for the French speakers and the Catalan learners of French. In this section, whenever the number of comparisons is too large and the report of the statistical results is too extensive, we refer to the Appendix.

Figure 4. Types of pitch accent used to mark information status at the referential level.

Note: Low = L*, falling = H*L, HL*, initial accent = Hi, high =!H*, H*, rising = L*H, LH*, HH*.

R-level. The results of the GLMER show main effects of PITCH ACCENT TYPE (χ2(5) = 1927.25, p < .0001) and PROFICIENCY (χ2(1) = 6.75, p < .01) and significant interactions between PITCH ACCENT TYPE × PROFICIENCY (χ2(5) = 99.08, p < .0001), IS REFERENTIAL LEVEL × PROFICIENCY (χ2(3) = 13.98, p < .01) and PITCH ACCENT TYPE × IS REFERENTIAL LEVEL (χ2(15) = 804.22, p < .0001), as well as a three-way interaction between PITCH ACCENT TYPE × IS REFERENTIAL LEVEL × PROFICIENCY (χ2(15) = 64.74, p < .0001).

For the native French speakers, high pitch accents are predominant: The high accent is also the most frequently used accent type within all categories at the r-level (Appendix B1). In addition, the high accent type is significantly more frequently used with r-bridging, r-unused and r-new expressions than with r-given expressions (z = 6.20, p < .0001; z = 7.81, p < .0001; z = 9.43, p < .0001). Rising accents may have a special function: The rising accent type is used significantly more often with r-bridging expressions than with r-given and r-new expressions (z = 5.40, p < .0001; z = 3.20, p < .01).

For the Catalan learners of French, too, high accents are predominant: The high accent type is significantly more frequently used with r-bridging, r-unused and r-new expressions than with r-given expressions (z = 6.88, p < .0001; z = 11.98, p < .0001; and z = 8.16, p < .0001). The high accent is the most frequently used accent type within the new-er r-categories, but within the r-given category, it is as frequently used as the Hi type and significantly more often than the rest of the pitch accent types, whose presence is rare (Appendix B2). Initial high accents and low accents are over-represented in r-given expressions: The initial and low accent types are used significantly more often to mark r-given expressions than r-bridging, r-unused and r-new expressions (z = 3.09, p < .01; z = 4.86, p < .0001; and z = 3.97, p < .001 and z = 4.41, p < .0001; z = 5.43, p < .0001; and z = 5.20, p < .0001, respectively).

Comparing L1 and L2 speakers, we find that Catalan learners of French tend to use the low accent type more frequently with all types of referential expressions (r-given z = 3.83, p < .001; r-bridging z = 2.54, p < .05; r-unused z = 4.37, p < .0001; and r-new z = 4.19, p < .001) and that they use the Hi more frequently with r-given expressions (z = 4.68, p < .0001).

L-level (figure in Appendix B3). The results of the GLMER indicate the main effects of PITCH ACCENT TYPE (χ2(5) = 1954.97, p < .0001) and PROFICIENCY (χ2(1) = 8.25, p < .01) and a significant interaction between PITCH ACCENT TYPE × PROFICIENCY (χ2(5) = 112.39, p < .001). No other significant interactions are observed.

Regardless of the l-level categories, for both L1 and L2 French speakers, the high accent type is generally more frequently used than the low accent type and the rising pitch accent type (z = 6.81, p < .0001; z = 5.95, p < .0001 and z = 4.91, p < .0001; z = 7.50, p < .0001). For the Catalan learners of French, we find a more frequent use of the low accent type compared to the initial and rising accent types (z = 6.52, p < .0001; z = 3.19, p < .01). In addition, the Catalan learners of French use significantly more low accents than native French speakers do (z = 4.20, p < .001).

3.2.3 Head gesture types

R-level. Figure 5 shows the proportions of head gestures used in each category at the r-level.

Figure 5. Types of head gesture used to mark information status at the referential level.

The results of the GLMER show a main effect of PROFICIENCY (χ2(1) = 6.60, p < .05) and HEAD GESTURE TYPE (χ2(5) = 4143.40, p < .0001), as well as significant interactions between HEAD GESTURE TYPE × IS REFERENTIAL LEVEL (χ2(15) = 279.50, p < .0001), HEAD GESTURE TYPE × PROFICIENCY (χ2(5) = 15.20, p < .01) and IS REFERENTIAL LEVEL × PROFICIENCY (χ2(3) = 27.50, p < .0001). No significant three-way interaction was observed.

Looking at French speakers and Catalan learners of French together, the r-given category is marked significantly less with all the types of gesture (Appendix C1). No neat pattern emerges from the comparison between head gesture types in both L1 and L2 speakers: Nods are the head gestures that emerge as more frequently used with both r-unused and r-new expressions, whereas protrusions and turns are to a larger extent associated with r-new expressions (Appendix C1). We find a significant difference between French speakers and Catalan learners of French in the r-unused category only (z = 2.42, p < .05). However, in terms of the type of head gesture, the significance of the interaction is solely due to differences when compared to the no-head gesture category.

L-level (figure in Appendix C2). The results of the GLMER show main effects of HEAD GESTURE TYPE (χ2(5) = 2299.61, p < .0001) and PROFICIENCY (χ2(1) = 7.46, p < .01), as well as significant interactions between HEAD GESTURE TYPE × IS LEXICAL LEVEL (χ2(10) = 25.16, p < .01) and HEAD GESTURE TYPE × PROFICIENCY (χ2(5) = 40.65, p < .0001). No other interactions were found to be significant.

Looking at French speakers and Catalan learners of French together, l-given words are significantly less marked with nods compared to l-accessible and l-new information (z = 3.19, p < .01; z = 2.58, p < .05). Nods, protrusions and turns are head gestures that most frequently mark l-new items in comparison with tilts and slides (Appendix C3). Other differences are due to the no-head gesture category.

4. Discussion

In this study, we examined whether native speakers and foreign language learners use prosodic and gestural cues, either separately or jointly, to mark information status (Research Question 1), and whether specific prominence levels, pitch accent types and/or head gesture types convey information status (Research Question 2). Our participants included native French speakers and Catalan learners of French, allowing us to contrast native and non-native speech in two languages traditionally considered to prioritize syntactic strategies over prosody in signalling information structure (e.g., Lambrecht, Reference Lambrecht1994; Vallduví, Reference Vallduví1991).

Regarding the first research question, our findings reveal similarities between native and non-native French speakers at both referential and lexical levels. In both groups, given expressions are significantly less marked, while new-er expressions were more frequently marked by pitch accents only or by a combination of pitch accents and head gestures. These results suggest that, despite claims that French relies on syntactic cues for information structure and primarily uses prosody and head gestures to mark phrase structure (e.g., Delais-Roussarie & Rialland, Reference Delais-Roussarie, Rialland, Baauw, Drijkoningen and Pinto2007; Esteve-Gibert et al., Reference Esteve-Gibert, Borràs-Comes, Asor, Swets and Prieto2017), speakers of French combine syntactic and multimodal prosodic strategies to signal information status. This finding complements studies showing that prosody and gesture may mark contrastive focus in French (e.g., Beyssade et al., Reference Beyssade, Hemforth, Marandin, Portes, Frazier and Gibson2015; Esteve-Gibert et al., Reference Esteve-Gibert, Loevenbruck, Dohen and D’Imperio2021; German & D’Imperio, Reference German and D’Imperio2016). At the r-level, we observe that most r-given expressions are not marked, which is expected since this category is mainly composed of pronouns (90%). Specifically, 85% of the pronouns and 20% of full noun phrases within the r-given category are not marked by pitch accents. At the lexical level, approximately 25% of l-given expressions are not marked by either pitch accents or pitch accents + head gestures, indicating a substantial amount of deaccentuations. Differences between new-er categories are minimal, which aligns with findings from Im and Baumann (Reference Im and Baumann2020) and Rohrer (Reference Rohrer2022) on multimodal prosody and with prior research showing that manual gestures tend to co-occur with both new or accessible information rather than given information (e.g., Debreslioska et al., Reference Debreslioska, Özyürek, Gullberg and Perniss2013; Debreslioska & Gullberg, Reference Debreslioska and Gullberg2020, Debreslioska & Gullberg, Reference Debreslioska and Gullberg2022; Ebert et al., Reference Ebert, Evert, Wilmes, Reich, Horch and Pauly2011; Gullberg, Reference Gullberg, Dimroth and Starren2003; Levy & Fowler, Reference Levy, Fowler and McNeill2000; Levy & McNeill, Reference Levy and McNeill1992; Marslen-Wilson et al., Reference Marslen-Wilson, Levy, Komisarjevsky Tyler, Jarvella and Klein1982).

While both native French speakers and Catalan learners of French considerably reduce their use of pitch accents and head gestures with given referents, Catalan learners still mark more r-given expressions than native French speakers. This could be due to L1 transfer effects from Catalan, a language with high accentual density of content words (e.g., Ortega-Llebaria & Prieto, Reference Ortega-Llebaria and Prieto2011). Similar L1 transfer effects have been observed in focus marking (Sánchez Alvarado & Armstrong, Reference Sánchez Alvarado and Armstrong2022; Fujimori et al., Reference Fujimori, Yamane, Yoshimura, Nakayama, Teaman, Yoneyama, Leal, Shimanskaya and Isabelli2022; O’Brien & Gut, Reference O’Brien, Gut, Wrembel, Kul and Dziubalska-Kolaczyk2011; Ramírez-Verdugo, Reference Ramírez-Verdugo2006; Yan & Calhoun, Reference Yan and Calhoun2022), and Catalan learners may have overemphasised content words in French due to the lexical stress rule in their native language. To verify this hypothesis, future research should compare the prosodic marking of IS in native Catalan speakers. However, our findings suggest that Catalan learners do not fully master multimodal prosody in French, as their expression of IS remains less differentiated than that of native speakers. Looking specifically at given information, we found no evidence of overexplicit use of head gestures by language learners, as was previously reported for manual gestures in Yoshioka (Reference Yoshioka2008) and So et al. (Reference So, Kita and Goldin-Meadow2013). A more detailed analysis of gestures accompanying overtly specified referential expressions and personal pronouns could provide further insight. Moving forward, syntactic annotations of our corpus will help clarify the interaction between syntax and multimodal prosody in both native and non-native French speakers.

The second research question examined whether perceived prominence, specific pitch accent types and head gestures contribute to information status marking in native and non-native French speakers. Across both groups, given expressions at the referential and lexical levels are perceived as less prominent than new referents, though this effect is less pronounced than in West Germanic languages (e.g., Rohrer, Reference Rohrer2022; Wagner & Watson, Reference Wagner and Watson2010). Entities previously mentioned in the discourse may be deaccented, but this largely depends on their morpho-syntactic form – whether they occur as pronouns or full noun phrases. Regarding pitch accent types, native French speakers use significantly fewer pitch accents with given referents across all types – low, high and rising. Because we do not find larger amounts of low or falling tones associated with given expressions, and despite the lower perceived prominence of given expressions at the r-level, there is no clear evidence of tonal compression. For new-er categories, the high pitch accent is the most frequently used, while initial high and low pitch accents are the least used. Interestingly, the rising pitch accent appears to be more strongly associated with r-bridging expressions, suggesting its role in emphasising accessible referents, such as when the native French speakers are describing their best friend’s physical features. These findings contrast with previous studies on West Germanic languages, where a more fine-grained correspondence exists between pitch accent types and information categories (Baumann & Riester, Reference Baumann and Riester2013; Röhr & Baumann, Reference Röhr, Baumann, Lee and Zee2011). For head gesture, nods, protrusions and turns are more commonly associated with new information at both the referential and lexical levels. Nods also co-occur significantly more often with accessible (r-unused, l-accessible) expressions. However, other differences in gesture type proportions are not significant, suggesting that head gesture type itself may not be the primary determinant of prominence. Instead, gestural movement amplitude may play a crucial role in prominence perception – a factor not considered in this study but worth investigating further.

Regarding perceived prominence, pitch accent types and head gesture types in Catalan learners of French, we observe patterns similar to native speakers. Learners differentiate between given and new referential and lexical expressions in terms of prominence, though given expressions are still perceived as more prominent than among native French speakers. Interestingly, they mark r-unused expressions as more prominent than r-new expressions. For pitch accent types, Catalan learners follow a trend similar to native speakers, particularly in the substantial percentage of high pitch accents on non-given information. However, they differ in their increased use of the high initial (i.e., prenuclear) accent on given referents. This may be explained by the large number of subject pronouns (classified by default as r-given). In Catalan, subject pronouns are usually elided when inferable, unlike in French. Aware of this contrast, learners may emphasise subject pronouns as a form of overcorrection. Additionally, in line with our expectations, Catalan learners consistently produce more low accents than native speakers do. This is likely due to L1 transfer, as low accents are common in Catalan, often occurring as nuclear accents in broad focus statements (Prieto, Reference Prieto and Jun2014). This suggests that learners may have transferred this pattern to their French narratives. To confirm this hypothesis, comparisons with L1 Catalan productions are needed. For head gestures, most information structural categories were not marked by a specific gesture type. The only exception was an increased use of nods on new-er referential expressions, which may indicate that learners associate nodding with higher prominence.

Our findings provide insight into the interaction between prosody and gesture, particularly the role of head gestures. The absence of significant differences between pitch accents only and pitch accent + head gestures in marking new-er expressions suggests that these cues may not function as entirely independent prominence markers. Additionally, the fact that head gestures marking referents are rarely used without an accompanying pitch accent further reinforces the idea that pitch accents are the primary cue for prominence, with head gestures serving a supporting role. These results align with previous findings on manual gestures (Im & Baumann, Reference Im and Baumann2020), confirming that prominence marking is inherently multimodal, and to some extent supporting general claims on the tight synchrony between prosody and gesture (e.g., Ambrazaitis & House, Reference Ambrazaitis and House2017; Loehr, Reference Loehr2012; McNeill, Reference McNeill1992; Jannedy & Mendoza-Denton, Reference Jannedy and Mendoza-Denton2005; Kendon, Reference Kendon2004; McClave, Reference McClave2000). Despite this interplay, the distribution of pitch accents and head gestures is not equivalent, indicating that their function may also differ. Similar to the eyebrow beats observed in Ambrazaitis and House (Reference Ambrazaitis and House2017), which amplify other prominence cues such as head beats, the head gestures in our dataset may primarily act as intensifiers of pitch accents. From a strictly multimodal perspective, as argued by Ambrazaitis and House (Reference Ambrazaitis and House2017), our head gestures and pitch accents would not qualify as truly cumulative, unlike the head and eyebrow movements described by Swerts and Krahmer (Reference Swerts and Krahmer2010), which are considered functionally equivalent cues for multimodal prominence. Given that head gestures rarely occur in isolation while pitch accents frequently do, our data suggest that pitch accents remain the default cue for prominence marking, even more so for learners than for native speakers of French. Consequently, head gestures appear to function as supplementary cues, enhancing the prominence of new or important information. However, only a perception study could determine whether listeners actually perceive a higher degree of prominence when head gestures and pitch accents co-occur, compared to pitch accents alone.

5. Conclusion

This study examined how native French speakers and Catalan learners of French use prosody and head gestures to signal different levels of givenness in narrative speech. Our findings indicate that both groups employ similar multimodal strategies to mark information status, reinforcing the idea that prosody plays a role in information structure marking in Romance languages, as suggested in previous research (Forcadell, Reference Forcadell2016 for Catalan; Portes & Reyle, Reference Portes and Reyle2022 for French, among many others). The relatively large number of deaccentuations in our data parallels findings from studies on West Germanic languages, suggesting that deaccentuation may be more widespread across language families than previously thought. The comparison between native and non-native speakers reveals notable differences. Catalan learners of French tend to produce pitch accents on given expressions more frequently than native speakers, which may indicate an incomplete mastery of the target language’s prosodic marking of information status. In addition, there is clear evidence of L1 transfer, as learners frequently use low pitch accents, a common feature in Catalan prosody, in their French productions. Regarding head gestures, our results show that referential and non-referential expressions marked by gestures are generally accompanied by prosodic prominence. However, the type of head gesture does not seem to play a decisive role in signalling meaning differences in discourse.

Supplementary material

The supplementary material for this article can be found at http://doi.org/10.1017/langcog.2025.11.

Acknowledgments

The first author is a Serra Húnter Fellow and acknowledges funding from the European Union-NextGenerationEU and the Spanish Ministry of Universities through the Recovery, Transformation and Resilience Plan, via a call from Pompeu Fabra University (Barcelona) at the time the research was conducted. This work was supported by the German Research Foundation (DFG) as part of the SFB1252 Prominence in Language (Project-ID 281511265), specifically project A07 Metrical Prominence – Scales and Structures. We sincerely thank Dr. Patrick Rohrer for his valuable support and contributions to this project.

Footnotes

1 In the present study, we adopt the definition of ‘foreign language’ as a language that has not been acquired from birth. This perspective characterises a foreign language as one learned subsequent to the acquisition of a person’s first or native language.

2 As there is no occurrence of head gesture only with l-accessible words, the analysis could not be performed for this category.

References

Alexanderson, S., House, D., & Beskow, J. (2013). Aspects of cooccurring syllables and head nods in spontaneous dialogue. In Proceedings of the 12th international conference on AuditoryVisual Speech Processing (AVSP2013) (pp. 169172). https://www.iscaspeech.org/archive/avsp_2013/alexanderson13_avsp.htmlGoogle Scholar
Ambrazaitis, G., & House, D. (2017). Multimodal prominences: Exploring the patterning and usage of focal pitch accents, head beats and eyebrow beats in Swedish television news readings. Speech Communication, 95, 100113. https://doi.org/10.1016/j.specom.2017.08.008CrossRefGoogle Scholar
Azar, Z., Backus, A., & Ozyurek, A. (2017). Highly proficient bilinguals maintain language-specific pragmatic constraints on pronouns: Evidence from speech and gesture. In Gunzelmann, G., Howes, A., Tenbrink, T., & Davelaar, E. (Eds.), Proceedings of the 39th annual conference of the cognitive science society (CogSci 2017) (pp. 8186). Austin, TX: Cognitive Science Society.Google Scholar
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 148. https://doi.org/10.18637/jss.v067.i01CrossRefGoogle Scholar
Baumann, S., & Riester, A. (2013). Coreference, lexical givenness and prosody in German. Lingua, 136, 1637. https://doi.org/10.1016/j.lingua.2013.07.012CrossRefGoogle Scholar
Baumann, S., & Röhr, C. (2015). The perceptual prominence of pitch accent types in German. In Proceedings of the 18th international congress of phonetic sciences (paper number 298). University of Glasgow. https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2015/proceedings.htmlGoogle Scholar
Beyssade, C., Hemforth, B., Marandin, J.-M., & Portes, C. (2015). Prosodic realizations of information focus in French. In Frazier, L. & Gibson, E. (Eds.), Explicit and implicit prosody in sentence processing (pp. 3961). Springer.Google Scholar
Boersma, P., & Weenink, D. (2022). Praat: Doing phonetics by computer [Computer program]. Version 6.2.06. Retrieved 15 March 2022 from https://www.praat.org.Google Scholar
Brunetti, L., Bott, S., Costa, J., & Vallduví, E. (2009). A multilingual annotated corpus for the study of Information Structure 1. Grammatik und Korpora 2009. Dritte internationale Konferenz, Mannheim, 22.–24.09.2009. Grammar & corpora 2009, 2011. https://shs.hal.science/halshs-01823541v1Google Scholar
Büring, D. (2007). Intonation, semantics and information structure. In Ramchand, G. & Reiss, C. (Eds.), The Oxford handbook of linguistic interfaces (pp. 445474). Oxford: Oxford University Press.Google Scholar
Calhoun, S. (2007). Information structure and the prosodic structure of English: A probabilistic relationship. PhD dissertation, University of Edinburgh. EThOS.Google Scholar
Calhoun, S. (2010a). The centrality of metrical structure in signaling information structure: A probabilistic perspective. Language, 86, 142. https://doi.org/10.1353/lan.0.0197CrossRefGoogle Scholar
Calhoun, S. (2010b). How does informativeness affect prosodic prominence? Language and Cognitive Processes, 25, 10991140. https://doi.org/10.1080/01690965.2010.491682CrossRefGoogle Scholar
Chafe, W. (1976). Givenness, contrastiveness, definiteness, subjects, and topics. In Li, C. (Ed.), Subject and topic (pp. 2755). Academic Press.Google Scholar
Chafe, W. (1994). Discourse, consciousness, and time. University of Chicago Press.Google Scholar
Cole, J. (2015). Prosody in context: A review. Language, Cognition and Neuroscience, 30, 131. https://doi.org/10.1080/23273798.2014.963130Google Scholar
Cruttenden, A. (2006). The deaccenting of given information: A cognitive universal? In Bernini, G. (Ed.), The pragmatic organisation of discourse (pp. 311355). Mouton de Gruyter.Google Scholar
Dahan, D., Tanenhaus, M. K., & Chambers, C. G. (2002). Accent and reference resolution in spoken-language comprehension. Journal of Memory and Language, 47, 292314. https://doi.org/10.1016/S0749-596X(02)00001-3CrossRefGoogle Scholar
Debreslioska, S., & Gullberg, M. (2020). What’s new? Gestures accompany inferable rather than brand-new referents in discourse. Frontiers in Psychology, 11, 1935. https://doi.org/10.3389/fpsyg.2020.01935Google ScholarPubMed
Debreslioska, S., & Gullberg, M. (2022). Information status predicts the incidence of gesture in discourse: An experimental study. Discourse Processes. https://doi.org/10.1080/0163853X.2022.2085476Google Scholar
Debreslioska, S., Özyürek, A., Gullberg, M., & Perniss, P. (2013). Gestural viewpoint signals referent accessibility. Discourse Processes, 50, 431456. https://doi.org/10.1080/0163853X.2013.824286Google Scholar
Delais-Roussarie, E., & Rialland, A. (2007). Metrical structure, tonal association and focus in French. In Baauw, S., Drijkoningen, F., & Pinto, M. (Eds.), Romance languages and linguistic theory 2005: Selected papers from ‘Going Romance’ (pp. 7398). John Benjamins.Google Scholar
Delais-Roussarie, E., Rialland, A., Doetjes, J., & Marandin, J.-M. (2002). The Prosody of post-focus sequences in French. In Bel, B. & Marlien, I. (Eds.), Proceedings of speech prosody 2002 (pp. 239242). http://sprosig.org/sp2002/papers.htmCrossRefGoogle Scholar
Delais-Roussarie, et al. (2015). Intonational phonology of French: Developing a ToBI system for French. In Frota, S. & Prieto, P. (Eds.), Intonation in Romance (pp. 63100). OUP. https://doi.org/10.1093/acprof:oso/9780199685332.003.0003CrossRefGoogle Scholar
Di Cristo, A. (1998). Intonation in French. In Hirst, D. J. & Di Cristo, A. (Eds.), Intonation systems: A survey of twenty languages (pp. 195218). Cambridge University Press.Google Scholar
Di Cristo, A. (2000). Vers une modélisation de l’accentuation du français (seconde partie). Journal of French Language Studies, 10(1), 2744.CrossRefGoogle Scholar
Dimroth, C. & Starren, M. (Eds.). (2003). Information structure and the dynamics of language acquisition. John Benjamins. https://doi.org/10.1075/sibil.26Google Scholar
Dohen, M., & Loevenbruck, H. (2004). Pre-focal rephrasing, focal enhancement and postfocal deaccentuation in French. Proceedings of Interspeech, 2004, 785788. https://www.isca-speech.org/archive/interspeech_2004/dohen04_interspeech.htmlCrossRefGoogle Scholar
Ebert, C., Evert, S., & Wilmes, K. (2011). Focus marking via gestures. In Reich, I., Horch, E., & Pauly, D. (Eds.), Proceedings of Sinn und Bedeutung 15 (pp. 193208). https://ojs.ub.uni-konstanz.de/sub/index.php/sub/article/view/372Google Scholar
Esteve-Gibert, N., Borràs-Comes, J., Asor, E., Swets, M., & Prieto, P. (2017). The timing of head movements: The role of prosodic heads and edges. Journal of the Acoustical Society of America, 141, 47274739. https://doi.org/10.1121/1.4986649Google ScholarPubMed
Esteve-Gibert, N., Loevenbruck, H., Dohen, M., & D’Imperio, M. P. (2021). Pre-schoolers use head gestures rather than prosodic cues to highlight important information in speech. Developmental Sciences, 25(1), e13154. https://doi.org/10.1111/desc.13154Google ScholarPubMed
Esteve-Gibert, N., & Prieto, P. (2013). Prosodic structure shapes the temporal realization of intonation and manual gesture movements. Journal of Speech, Language, and Hearing Research, 56, 850864. https://doi.org/10.1044/1092-4388(2012/12-0049)CrossRefGoogle ScholarPubMed
Féry, C. (2001). Focus and phrasing in French. In Féry, C. & Sternefeld, W. (Eds.), Audiatur Vox Sapientiae: A Festschrift for Arnim von Stechow (pp. 153181). Akademie Verlag. https://doi.org/10.1515/9783050080116.153CrossRefGoogle Scholar
Féry, C. (2014). Final compression in French as a phrasal phenomenon. In Bourns, S. Katz & Myer, L. L. (Eds.), Perspectives on linguistic structure and context (pp. 133156). Benjamins. https://doi.org/10.1075/pbns.244.07ferGoogle Scholar
Féry, C., & Ishihara, I. (2014). The Oxford handbook of information structure. Oxford University Press.Google Scholar
Flecha-García, M. L. (2010). Eyebrow raises in dialogue and their relation to discourse structure, utterance function and pitch accents in English. Speech Communication, 52, 542554. https://doi.org/10.1016/j.specom.2009.12.003CrossRefGoogle Scholar
Forcadell, M. (2016). New prosodic patterns in Catalan: Information status and (de)accentability. Journal of Pragmatics, 97, 120. https://doi.org/10.1016/j.pragma.2016.03.007CrossRefGoogle Scholar
Fujimori, A., Yamane, N., Yoshimura, N., Nakayama, M., Teaman, B., & Yoneyama, K. (2022). Development of L2 prosody: The case of information focus. In Leal, T., Shimanskaya, E., & Isabelli, C. A. (Eds.), Generative SLA in the age of minimalism: Features, interfaces, and beyond. Selected proceedings of the 15th generative approaches to second language acquisition conference (pp. 137156). John Benjamins. https://doi.org/10.1075/lald.67.06fujGoogle Scholar
German, J-S, & D’Imperio, P. (2016). The status of the initial rise as a marker of focus in French. Language and Speech, 59(2), 165195. https://doi.org/10.1177/0023830915583082Google ScholarPubMed
Goldman, J.-P. (2011). EasyAlign: An automatic phonetic alignment tool under Praat. Proceedings of Interspeech 2011 (pp. 32333236). ISCA Archive. https://doi.org/10.21437/Interspeech.2011-815Google Scholar
Gregori, A, Sánchez-Ramón, P. G., Prieto, P., & Kügler, F. (2024). Prosodic and gestural marking of focus types in Catalan and German. In Chen, Y., Chen, A., & Arvaniti, A. (Eds.), Proceedings of speech prosody 2024 (pp. 891895). ISCA Archive. https://doi.org/10.21437/SpeechProsody.2024-180CrossRefGoogle Scholar
Gullberg, M. (2003). Gestures, referents, and anaphoric linkage in learner varieties. In Dimroth, C. & Starren, M. (Eds.), Information structure, linguistic structure and the dynamics of language acquisition (pp. 311328). Benjamins. https://doi.org/10.1075/sibil.26.15gulGoogle Scholar
Gullberg, M. (2006). Handling discourse: Gestures, reference tracking, and communication strategies in early L2. Language Learning, 56, 155196. https://doi.org/10.1111/j.0023-8333.2006.00344.xCrossRefGoogle Scholar
Gut, U., & Pillai, S. (2014). Prosodic marking of information structure by Malaysian speakers of English. Studies in Second Language Acquisition, 36, 283302. https://doi.org/10.1017/S0272263113000739Google Scholar
Hahn, L. D. (2004). Primary stress and intelligibility: Research to motivate the teaching of suprasegmentals. TESOL Quarterly, 38, 201223. https://doi.org/10.2307/3588378Google Scholar
Halliday, M. A. K. (1967). Intonation and grammar in British English. Mouton.Google Scholar
Hamlaoui, F. (2009). La focalisation à l’interface de la syntaxe et de la phonologie: le cas du français dans une perspective typologique. Ph.D. dissertation, Paris 3 Sorbonne Nouvelle. Theses.fr.Google Scholar
Hendriks, H. (2003). Using nouns for reference maintenance: A seeming contradiction in L2 discourse. In Ramat, Anna G. (Ed.), Typology and second language acquisition (pp. 291326). Mouton de Gruyter.Google Scholar
Hirschberg, J. (1993). Pitch accent in context predicting intonational prominence from text. Artificial Intelligence, 63, 305340. https://doi.org/10.1016/0004-3702(93)90020-CGoogle Scholar
Holler, J., & Bavelas, J. (2017). Multimodal communication of common ground: A review of social functions. In Church, R. B., Alibali, M. W. & Kelly, S. D. (Eds.), Why gesture? How the hands function in speaking, thinking and communicating (pp. 213240). John Benjamins. https://doi.org/10.1075/gs.7.11holGoogle Scholar
Hosseini, A. (2013). L1 interference in L2 prosody: Contrastive focus in Japanese and Persian. Language and Information Sciences, 11, 5567.Google Scholar
Hualde, J. I., & Prieto, P. (2016). Towards an international prosodic alphabet (IPrA). Laboratory Phonology, 7, 5. https://doi.org/10.5334/labphon.11Google Scholar
Im, S., & Baumann, S. (2020). Probabilistic relation between co-speech gestures, pitch accents and information status. Proceedings of the LSA, 5, 685697. https://doi.org/10.3765/plsa.v5i1.4755.Google Scholar
Jackendoff, R. (1972). Semantic interpretation in generative grammar. MIT Press.Google Scholar
Jannedy, S., & Mendoza-Denton, N. (2005). Structuring information through gesture and intonation. In Interdisciplinary studies on information structure: ISIS; working papers of the SFB 632, 3, 199244. http://opus.kobv.de/ubp/volltexte/2006/877/Google Scholar
Jun, S.-A., & Fougeron, C. (2000). A phonological model of French intonation. In Botinis, A. (Ed.), Intonation: Analysis, modeling and technology (pp. 209242). Springer. https://doi.org/10.1007/978-94-011-4317-2_10Google Scholar
Jung, E. H. (2004). Topic and subject prominence in interlanguage development. Language Learning, 54, 713738.Google Scholar
Karpiński, M., Jarmołowicz-Nowikow, E., & Malisz, Z. (2009). Aspects of gestural and prosodic structure of multimodal utterances in Polish task-oriented dialogues. Speech and Language Technology, 11, 113122.Google Scholar
Kendon, A. (2004). Gesture: Visible action as utterance. Cambridge University Press.Google Scholar
Klein, W. (2012). The information structure of French. In Krifka, M. and Musan, R. (Eds.), The expression of information structure (pp. 95126). de Gruyter.Google Scholar
Krahmer, E., & Swerts, M. (2007). The effects of visual beats on prosodic prominence: Acoustic analyses, auditory perception and visual perception. Journal of Memory and Language, 57, 396414. https://doi.org/10.1016/j.jml.2007.06.005Google Scholar
Krahmer, E., & Swerts, M. (2009). Audiovisual Prosody–Introduction to the Special Issue. Language and Speech, 52(2-3), 129133. https://doi.org/10.1177/0023830909103164Google ScholarPubMed
Kügler, F., Baumann, S., & Röhr, C. T. (2022). Deutsche Intonation, Modellierung und Annotation (DIMA) – Richtlinien zur prosodischen Annotation des Deutschen. In: Schwarze, C. & Grawunder, S. (Eds.), Transkription und Annotation gesprochener Sprache und multimodaler Interaktion (pp. 2354). Narr.Google Scholar
Kügler, F., & Calhoun, S. (2020). Prosodic encoding of information structure: A typological perspective. In Gussenhoven, C. & Chen, A. (eds.), The Oxford handbook of language prosody (pp. 454467). Oxford Academic. https://doi.org/10.1093/oxfordhb/9780198832232.013.30Google Scholar
Ladd, R. (2008). Intonational phonology. Cambridge University Press.CrossRefGoogle Scholar
Lambrecht, K. (1994). Information structure and sentence form: Topic, focus and the mental representations of discourse referents. Cambridge University Press.Google Scholar
Lenth, R.V. (2021) Emmeans: Estimated marginal means, aka least-squares means. https://cran.r-project.org/package=emmeansGoogle Scholar
Leonard, T., & Cummins, F. (2011). The temporal relation between beat gestures and speech. Language and Cognitive Processes, 26, 14571471. https://doi.org/10.1080/01690965.2010.500218Google Scholar
Levy, E. T., & Fowler, C. A. (2000). The role of gestures and other graded language forms in the grounding of reference. In McNeill, D. (Ed.), Language and gesture (pp. 215234). Cambridge University Press. https://doi.org/10.1017/cbo9780511620850.014Google Scholar
Levy, E. T., & McNeill, D. (1992). Speech, gesture, and discourse, Discourse Processes, 15, 277301. https://doi.org/10.1080/01638539209544813Google Scholar
Loehr, D. P. (2012). Temporal, structural, and pragmatic synchrony between intonation and gesture. Laboratory Phonology, 3, 7189. https://doi.org/10.1515/lp-2012-0006CrossRefGoogle Scholar
Lozano, C. & Callies, M. (2018). Word orderand information structure in advanced SLA. In Malovrh, P. A. & Benati, A. G. (Eds), The handbook of advanced proficiency in second language acquisition. Wiley. https://doi.org/10.1002/9781119261650.ch22Google Scholar
Lücking, A., Ptock, S., Bergmann, K. (2012). Assessing agreement on segmentations by means of Staccato, the segmentation agreement calculator according to Thomann. In Efthimiou, E., Kouroupetroglou, G., Fotinea, S. E. (Eds.), Gesture and sign language in human-computer interaction and embodied communication. GW 2011. Lecture notes in computer science, 7206. Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-642-34182-3_12Google Scholar
Ludusan, B., Schröer, M., Rossi, M., Wagner, P. (2023). The co-use of laughter and head gestures across speech styles. Proceedings of Interspeech, 2023, 35923596. https://doi.org/10.21437/Interspeech.2023-240CrossRefGoogle Scholar
Marslen-Wilson, W. D., Levy, E., and Komisarjevsky Tyler, L. (1982). Producing interpretable discourse: The establishment and maintenance of reference. In Jarvella, R. J. & Klein, W. (Eds.), Language, place, and action: Studies in deixis and related topics (pp. 339378). Wiley.Google Scholar
McClave, E. Z. (2000). Linguistic functions of head movements in the context of speech. Journal of Pragmatics, 32(7), 855878. https://doi.org/10.1016/S0378-2166(99)00079-XCrossRefGoogle Scholar
McNeill, D. (1992). Hand and mind: What gestures reveal about thought. University of Chicago Press.Google Scholar
Mennen, I. (2007). Phonological and phonetic influences in non-native intonation. In Trouvain, J. & Gut, U. (Eds.), Non-native prosody: Phonetic descriptions and teaching practice (pp. 5376). Mouton De Gruyter.Google Scholar
Mertens, P. (2008). Syntaxe, prosodie et structure informationnelle: une approche prédictive pour l’analyse de l’intonation dans le discours. Travaux de Linguistique, 56(1), 87124. https://doi.org/10.3917/tl.056.0097Google Scholar
Michelas, A., & German, J.-S. (2020). Focus marking and prosodic boundary strength in French. Phonetica, 77(4), 244267. https://doi.org/10.1159/000499071CrossRefGoogle ScholarPubMed
Mok, I., van Maastricht, L., & Esteve-Gibert, N. (2022). Do head gestures function as precursors for prosodic focus marking in the L2? In Proceedings of the speech prosody conference 2022 (pp. 6771). https://www.isca-speech.org/archive/pdfs/speechprosody_2022Google Scholar
O’Brien, M., & Gut, U. (2011). Phonological and phonetic realisation of different types of focus in L2 speech. In Wrembel, M., Kul, M., & Dziubalska-Kolaczyk, K. (Eds.), Achievements and perspectives in SLA of speech: New sounds 2010 , Vol. 1 (pp. 275286). Peter Lang.Google Scholar
Ortega-Llebaria, M., & Colantoni, L. (2014). L2 English intonation, relations between form-meaning association, access to meaning, and L1 transfer. Studies in Second Language Acquisition, 36, 331353. https://doi.org/10.1017/S0272263114000011Google Scholar
Ortega-Llebaria, M., & Prieto, P. (2011). Acoustic correlates of stress in central Catalan and Castilian Spanish. Language and Speech, 54(1), 7397. https://doi.org/10.1177/0023830910388014Google ScholarPubMed
Portes, C., & Reyle, U. (2022). Combining syntax and prosody to signal information structure: The case of French. In Proceedings of speech prosody 2022. Lisbon. https://doi.org/10.21437/SpeechProsody.2022-18Google Scholar
Prieto, P. (2014). The intonational phonology of Catalan. In Jun, Sun-Ah (Ed.), Prosodic typology II: The phonology of intonation and phrasing (pp. 4380). Oxford University Press.CrossRefGoogle Scholar
Prince, E. (1981). Toward a taxonomy of given-new information. In Cole, P. (Ed.), Radical pragmatics (pp. 223255). Academic Press.Google Scholar
Prince, E. (1992). The ZPG Letter: Subjects, definiteness, and information-status. In Thompson, S. A. & Mann, W. C. (Eds.), Discourse description: Diverse analyses of a fund raising text (pp. 295325). John Benjamins. https://doi.org/10.1075/pbns.16.12priGoogle Scholar
R Core Team (2022) R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.orgGoogle Scholar
Ramírez-Verdugo, D. (2002). Non-native interlanguage intonation systems: A study based on a computerized corpus of Spanish learners of English. ICAME Journal, 26, 115132.Google Scholar
Ramírez-Verdugo, D. (2006). A study of intonation awareness and learning in non-native speakers of English. Language Awareness, 15, 141159. https://doi.org/10.2167/la404.0Google Scholar
Rasier, L., Caspers, J., & van Heuven, V. J. (2010). Accentual marking of information status in Dutch and French as foreign languages. Production and perception. In Dziubalska-Kołaczyk, K., Wrembel, M. & Kul, M. (Eds.), Proceedings of the 6th international symposium on the acquisition of second language speech (pp. 379385). New Sounds.Google Scholar
Riester, A., & Baumann, S. (2013). Focus triggers and focus types from a corpus perspective. Dialogue and Discourse, 4 (2), 215248. https://doi.org/10.5087/dad.2013.210CrossRefGoogle Scholar
Riester, A., & Baumann, S. (2017). The RefLex scheme – Annotation guidelines. SinSpeC. Working Papers of the SFB 732 , vol. 14. University of Stuttgart.Google Scholar
Röhr, C., & Baumann, S. (2011). Decoding information status by type and position of accent in German. In Lee, W.-S., & Zee, E. (Eds.), Proceedings of the 17th international conference on phonetic sciences (pp. 17061709). Department of Chinese, Translation and Linguistics, City University of Hong Kong.Google Scholar
Rohrer, P. (2022). A temporal and pragmatic analysis of gesture-speech association: A corpus-based approach using the novel MultiModal MultiDimension (M3D) labeling system. PhD dissertation, Universitat Pompeu Fabra. TDX.Google Scholar
Rohrer, P., Tütüncübasi, U., Vilà-Giménez, I., Florit-Pons, J., Esteve-Gibert, N., Ren-Mitchell, A., Shattuck-Hufnagel, S., & Prieto, P. (2023). The MultiModal MultiDimensional (M3D) labeling system. Open Science Framework. Retrieved from: https://doi.org/10.17605/OSF.IO/ANKDXCrossRefGoogle Scholar
Rooth, M. (1992). A theory of focus interpretation. Natural Language Semantics, 1, 75116. https://doi.org/10.1007/BF02342617Google Scholar
Sánchez Alvarado, C., & Armstrong, M. (2022). Prosodic marking of object focus in L2 Spanish. Studies in Hispanic and Lusophone Linguistics, 15, 211250.CrossRefGoogle Scholar
Schwarzschild, R. (1999). GIVENness, AvoidF and other constraints on the placement of focus. Natural Language Semantics, 7, 141177. https://doi.org/10.1023/A:1008370902407Google Scholar
Selkirk, E. (1995). Sentence prosody: Intonation, stress and phrasing. In Goldsmith, J. A. (Ed.), The handbook of phonological theory (pp. 550569). Blackwell.Google Scholar
Shattuck-Hufnagel, S., & Ren, A. (2018). The prosodic characteristics of non-referential co-speech gestures in a sample of academic-lecture-style speech. Frontiers in Psychology, 9, Article 1514. https://doi.org/10.3389/fpsyg.2018.01514CrossRefGoogle Scholar
So, W. C., Kita, S., & Goldin-Meadow, S. (2013). When do speakers use gestures to specify who does what to whom? The role of language proficiency and type of gestures in narratives. Journal of Psycholinguistic Research, 42, 581594. https://doi.org/10.1007/s10936-012-9230-6Google ScholarPubMed
Swerts, M., & Krahmer, E. J. (2010). Visual prosody of newsreaders: Effects of information structure, emotional content and intended audience on facial expressions. Journal of Phonetics, 38, 197206. https://doi.org/10.1016/j.wocn.2009.10.002CrossRefGoogle Scholar
Swerts, M., Krahmer, E. J., & Avesani, C. (2002). Prosodic marking of information status in Dutch and Italian. Journal of Phonetics, 30, 629654. https://doi.org/10.1006/jpho.2002.0178Google Scholar
Terken, J., & Hirschberg, J. (1994). Deaccentuation of words representing “given” information: Effects of persistence of grammatical function and surface position. Language and Speech, 37, 125145. https://doi.org/10.1177/002383099403700202Google Scholar
Thomann, B. (2001). Observation and judgment in psychology: Assessing agreement among markings of behavioral events. Behavior Research Methods, Instruments, & Computers, 33(3), 339348. https://doi.org/10.3758/BF03195387Google ScholarPubMed
Türk, O. (2020). Gesture, prosody, and information structure synchronization in Turkish. PhD dissertation, Victoria University of Wellington. http://hdl.handle.net/10063/9231Google Scholar
Vallduví, E. (1991). The role of plasticity in the association of focus and prominence. Eastern States Conference in Linguistics, 7, 295306.Google Scholar
Vallduví, E. (1993). The Informational Component. IRCS Technical Reports Series, 188. http://repository.upenn.edu/ircs_reports/188Google Scholar
Vallduví, E. (2008). L’oració com a unitat informativa. In Solà, J. et al. (Eds.), Gramàtica del català contemporani , vol. 2 (pp. 12211279). Empúries.Google Scholar
van Maastricht, L., Krahmer, E., & Swerts, M. (2016a). Native speaker perceptions of (non-) native prominence patterns: Effects of deviance in pitch accent distributions on accentedness, comprehensibility, intelligibility, and nativeness. Speech Communication, 83, 2133. https://doi.org/10.1016/j.specom.2016.07.008CrossRefGoogle Scholar
van Maastricht, L., Krahmer, E., & Swerts, M. (2016b). Prominence patterns in a second language: Intonational transfer from Dutch to Spanish and vice versa. Language Learning, 66, 124158. https://doi.org/10.1111/lang.12141Google Scholar
Vanrell, M. M., and Fernández-Soriano, O. (2013). Variation at the interfaces in Ibero-Romance. CatJL, 12, 253282. https://doi.org/10.5565/rev/catjl.63Google Scholar
Wagner, M., & Watson, D. (2010). Experimental and theoretical advances in prosody: A review. Language and Cognitive Processes, 25(7–9), 905945. https://doi.org/10.1080/01690961003589492CrossRefGoogle ScholarPubMed
Wagner, P., Malisz, Z., & Kopp, S. (2014). Gesture and speech in interaction: An overview. Speech Communication, 57, 209232. https://doi.org/10.1016/j.specom.2013.09.008Google Scholar
Watson, D. G., Arnold, J. E., & Tanenhaus, M. K. (2005, March). Not just given and new: The effect of discourse and task based constraints on acoustic prominence. Poster presented at the 2005 CUNY Human Sentence Processing Conference, Tucson, AZ.Google Scholar
Wennerstrom, A. (1998). Intonation as cohesion in academic discourse: A study of Chinese Speakers of English. Studies of Second Language Acquisition, 20, 125. https://doi.org/10.1017/S0272263198001016Google Scholar
Winter, B. (2020). Statistics for linguists. An introduction using R. RoutledgeGoogle Scholar
Yan, M., & Calhoun, S. (2022). Prosodic prominence and clefting in L2 focus interpretation. In Proceedings of speech prosody 2022 (pp. 901905). https://doi.org/10.21437/SpeechProsody.2022-183Google Scholar
Yasinnik, Y., Renwick, M., & Shattuck-Hufnagel, S. (2004). The timing of speech-accompanying gestures with respect to prosody. The Journal of the Acoustical Society of America, 115, 2397. https://doi.org/10.1121/1.4780717Google Scholar
Yoshioka, K. (2008). Gesture and information structure in first and second language. Gesture, 8, 236255. https://doi.org/10.1075/gest.8.2.07yosGoogle Scholar
Figure 0

Table 1. RefLex annotation labels used in this study. Words and phrases in boldface are examples of the respective labels

Figure 1

Figure 1. ELAN screenshot with the annotation layers for text (word, syllable), head gesture (type, apex), prosody (phrase ends, pitch accents and boundary tones) and perceived prominence levels.

Figure 2

Table 2. Number of expressions annotated at the referential level in each category and total number of pitch accents and head gestures marking these expressions

Figure 3

Figure 2. Marking of information status (IS) at the referential level.

Figure 4

Table 3. Number of expressions annotated at the lexical level in each category and number of pitch accents and head gestures marking these expressions

Figure 5

Figure 3. Marking of information status (IS) at the lexical level.

Figure 6

Table 4. Mean perceived prominence and standard deviation for the information status categories at the r- and l-levels

Figure 7

Figure 4. Types of pitch accent used to mark information status at the referential level.Note: Low = L*, falling = H*L, HL*, initial accent = Hi, high =!H*, H*, rising = L*H, LH*, HH*.

Figure 8

Figure 5. Types of head gesture used to mark information status at the referential level.

Supplementary material: File

Baills and Baumann supplementary material

Baills and Baumann supplementary material
Download Baills and Baumann supplementary material(File)
File 507.5 KB