Introduction
How information-seeking questions (also known as wh-questions) are interpreted is an issue that has received quite a lot of attention in the sentence-processing literature. Most of the studies focus on long-distance dependencies examining how fronted wh-phrases are interpreted at their canonical position and the effects that result from keeping the fronted wh-phrase in the parser’s working memory until resolving the open dependency. Standard “filled-gap effects” (Crain & Fodor, Reference Crain, Fodor, Dowty, Karttunen and Zwicky1985; Stowe, Reference Stowe1986; see Pablos, Reference Pablos2008 for overview) are associated with reading time evidence showing that readers expect the wh-phrase (the filler) to be discharged and interpreted at the first available grammatical position. The failure of discharging the wh-phrase results in longer reading times than their declarative counterparts. In addition, there is growing evidence that the interpretation of sentence meaning is achieved incrementally with comprehenders predicting upcoming information (including lexical and syntactic structure) based on available input (e.g. Altmann & Kamide, Reference Altmann and Kamide1999; Levy, Reference Levy2008). Under prediction accounts, projecting a wh-gap in fronted wh-questions occurs at the moment the wh-phrase is encountered.
The above scenario cannot be directly extended to wh-questions where the wh-phrase stays in its canonical position, known as wh-in-situ questions. The interpretation of these questions may result in a temporary syntactic ambiguity in comparison with their declarative counterparts that have a non-wh-word at the same site. Research shows that available contextual and prosodic information is used to predict the upcoming structure (Fodor, Reference Fodor2002; Déprez et al., Reference Déprez, Syrett and Kawahara2013; Gryllia et al., Reference Gryllia, Cheng and Doetjes2016; Gryllia et al., Reference Gryllia, Doetjes, Yang and Cheng2020; Yang et al., Reference Yang, Gryllia, Pablos and Cheng2019; Kawahara et al., Reference Kawahara, Shaw and Ishihara2022), practically resolving the ambiguity before encountering the wh-phrase. Nonetheless, this raises the question of how such wh-questions are parsed and to what extent readers anticipate an upcoming in-situ wh-question in the absence of other information (e.g., prosody, context, information structure) during the reading process. These questions form the focus of this study. More specifically, we examine how speakers of two wh-in-situ languages, Mandarin Chinese and French, proceed in the real-time reading of sentences presented without any preceding contextual or prosodic cues that could bias interpretations as questions or declarative statements. These two languages differ in the types of strategies they permit for wh-questions. Whereas Mandarin wh-questions are always in situ, French permits both in situ and fronted wh-questions. We investigate whether this variation in wh-in-situ strategies influences processing difficulty and predictability. In addition, we also examine an additional potential factor which might influence the parsing of wh-in-situ questions, namely, the complexity of the wh-phrase (i.e., simplex wh-phrase such as who or complex wh-phrase such as which person). Our second research question thus addresses whether there are processing differences between the complex and simplex wh-phrases in in-situ wh-questions.
Mandarin Chinese and French
Question formation strategies across languages
Languages differ in the number and type of strategies for forming wh-questions (see for instance Cheng, Reference Cheng1991) and can be categorized into three primary groups. The first type obligatorily fronts wh-words in wh-questions, as in English (1).Footnote 1, Footnote 2
-
(1) Who i did you meet t i at the art museum yesterday?
The second type consists of languages that always retain wh-words in their canonical position (i.e., in-situ) when formulating a wh-question, as in Mandarin Chinese (2).

The third language type permits both fronting and in-situ wh-question formation, as in French illustrated in (3a) and (3b).Footnote 3, Footnote 4


Adli (Reference Adli, Adli, García García and Kaufmann2015) examined the prevalence and distribution of wh-in-situ questions in relation to other variants of wh-question formation in French. He presented an assessment of spontaneous speech in French obtained from the Sgs database (with 10943 sentences) and showed that 56.2% of the total number of 1721 interrogative utterances (excluding echo-questions) are in-situ wh-questions. The study further found that the relative frequency of these in-situ questions is 0.62 for wh-adjuncts and 0.43 for wh-objects. A more recent study found an increase in the use of wh-in-situ in the last decade (Baunaz & Bonan, Reference Baunaz and Bonan2023).
These different strategies pose interesting questions regarding the processes used in the online comprehension of wh-in-situ constructions where the clause type of the sentence (question or declarative) is only obvious when the wh-word is encountered. If English were to permit in-situ wh-questions like Mandarin and French, a comparison of (4) and (5) illustrates that the difference between the wh-in-situ question (4) and the declarative sentence (5) is only revealed at the postverbal object position (see Note 3).
(4) (You said) Peter would like to meet whom tomorrow?
(5) (You said) Peter would like to meet a friend tomorrow.
Crucially, unless prosodic or contextual information is available, no distinction can be made between these two sentences by readers (up to the object position) as they proceed incrementally in the interpretation.
The syntax and processing of in-situ wh-questions: previous studies
Syntactic studies of in-situ wh-questions analyze these questions as involving a covert dependency, such that the in-situ wh-phrase either is related to an interrogative operator or raised to the structurally higher operator position (SpecCP position) at the Logical Form (LF; for further discussion see Aoun and Li, Reference Aoun and Li1993; Cheng, Reference Cheng1991, Reference Cheng2003, Reference Cheng2009; Huang, Reference Huang1982; Tsai, Reference Tsai1994; and Bayer & Cheng, Reference Bayer, Cheng, Everaert and van Riemsdijk2017 for an overview). The covert dependency is thus on par with overt dependencies in questions with overt wh-fronting in that it involves a syntactic representation where an (covert) operator is in the structurally higher position (i.e., SpecCP), which determines the clause type of the sentence.Footnote 5 This in turn raises an interesting question concerning the representation of in-situ wh-questions in the processing system. If the same processing mechanisms are used in processing dependencies, the abstract link between the wh-phrase and the SpecCP position in the case of wh-in-situ questions should manifest as a nonlocal dependency formation.
There has been to date limited research on the processing of in situ wh-questions. In French, for example, most of the research focused on the production of the prosodic features or the acceptability of in-situ wh-questions, but not on how these questions are interpreted incrementally (see Adli, Reference Adli, Meisenburg and Selig2004, Reference Adli2006; Beyssade et al., Reference Beyssade, Delais-Roussarie and Marandin2007; Delattre, Reference Delattre1966; Deprez et al., Reference Déprez, Syrett and Kawahara2013; Wunderli, Reference Wunderli1983, Reference Wunderli1984; Oiry, Reference Oiry2011; Tual, Reference Tual2017a, Reference Tual2017b from discussion in Glasbergen-Plas, Reference Glasbergen-Plas2021). Ueno and Kluender’s (Reference Ueno and Kluender2009) study of Japanese wh-in-situ constructions showed an effect, manifested as a right-lateralized-anterior negativity (RLAN), on longer distance covert dependency formation. In Japanese, however, the question marker (also a scope marker) is morphologically overt. In Mandarin Chinese, studies by Xiang et al. (Reference Xiang, Dillon, Wagers, Liu and Guo2013, Reference Xiang, Wang and Cui2015) examined the processing of in-situ complex wh-questions (i.e., which x questions) of different lengths (mono-clause vs. embedded clause) in comparison with declaratives (mono-clause vs. embedded clause) using the Speed-Accuracy Tradeoff (SAT) methodology. They looked at differences in wh-dependency length across their stimuli and found that length had an impact on processing accuracy but not on processing speed. Their results showed that questions such as (7b) had lower processing accuracy than those in (7a) but were equally slow in comparison to declaratives such as (6a) and (6b).




The increased processing time of wh-in-situ questions in (7a, b) was attributed to the effects of establishing a long-distance covert dependency. The effect of length on the accuracy, but not on the speed of processing of wh-in-situ, supports the notion of a covert dependency retrieved by a content-addressable memory process (McElree, Reference McElree2000; McElree et al., Reference McElree, Foraker and Dyer2003).
Predictions for parsing in-situ wh-questions
The evolution of the sentence comprehension models over the years has led to a consensus of an incremental interpretation process where the human parser interprets the available information incrementally building up a representation of the sentence meaning as the input unfolds, without delay. Still, there are a few aspects in which available models differ and that are relevant for the interpretation of observed processing difficulty. A growing amount of evidence points to the predictive nature of the comprehension processes (e.g. Levy, Reference Levy2008; Altmann & Mirkovic, Reference Altmann and Mirković2009), although there remains a debate on the interpretation of the concept of prediction and the difference with respect to integration processes (for a summary discussion see Pickering & Gambi, Reference Pickering and Gambi2018 and Kuperberg & Jaeger, Reference Kuperberg and Jaeger2016, and counter-arguments in Huettig & Mani, Reference Huettig and Mani2016). In simple terms, prediction implies the activation of linguistic information before input is available. Predictive models can thus be understood in a probabilistic framework in which the parser updates continuously the projected structure and expected lexical content based on the information as it becomes available (Levy, Reference Levy2008).
Considering the fact that declarative sentences are more frequent than questions in the world’s languages and that they tend to be the most unmarked of the clause types (Ma et al., Reference Ma, Ciocca and Whitehill2011), in the case of parsing a wh-in-situ question up to the wh-phrase (i.e., parsing the part of the sentence which is the same in both wh-questions and declaratives), the initial prediction made by the parser would be based on the most frequent structure. It should also be noted that Adli (Reference Adli, Adli, García García and Kaufmann2015) in his study of spontaneous speech in French also reported that questions (including wh-in-situ questions) constitute only 15.72% of utterances (1721 out of 10943 sentences), highlighting the dominance of declaratives in the dataset and thus reinforcing the expectation of declaratives over interrogatives. We therefore predict a processing slowdown at the wh-phrase when processing wh-in-situ questions as compared to the declarative counterpart.
The nature of the observed processing difficulty, however, can have a different interpretation depending on the theoretical processing model considered. In “classical terms,” it can be considered an indication of re-analysis to reconstruct the projected structure (Fodor & Ferreira, Reference Fodor and Ferreira1998), or the activation of the alternative structure. Further, the level of difficulty has been postulated to be quantified by measures such as the surprisal (Levy, Reference Levy2008) and entropy (Linzen & Jaeger, Reference Linzen and Jaeger2016), which represent formalizations of the predictability of a word or structure in a certain context. These measures can be estimated from corpora or, traditionally, from Cloze probability procedures. These models are considered serial as only one interpretation is active at a given time, in comparison to models where multiple interpretations are concurrently active in parallel with different levels of activation. Under parallel activation, we can consider activation-based retrieval models (Van Dyke & Lewis, Reference Van Dyke and Lewis2003; Lewis & Vasishth, Reference Lewis and Vasishth2005) or more recently, the proposed parallel architecture model (Huettig et al., Reference Huettig, Audring and Jackendoff2022). In the activation-based retrieval models, processing difficulties at the wh-in-situ site reflect reactivation of the alternative structure combined with the integration of the covert dependency. In the parallel architecture model (Huetting et al., Reference Huettig, Audring and Jackendoff2022), the potential structures, encoded as a lexicon (Jackendoff & Audring, Reference Jackendoff and Audring2020), are all active simultaneously as the first words are encountered (within-item activation) with different “resting activations”, linked to their frequency.
All the models described above would predict readers of Mandarin Chinese and French to have additional processing costs (observed as longer reading times) when encountering the wh-in-situ phrase, as compared to the non-wh noun phrase in the declarative counterpart. This processing cost could either be due to reanalysis, reactivation or covert dependency integration. The extent to which the parser anticipates upcoming structure when there is no other cue available might be modulated by the likelihood of encountering in-situ wh-phrases in each of the languages under study. In Mandarin, an in-situ question is the only option for wh-questions. In contrast, in French, as mentioned above, Adli (Reference Adli, Adli, García García and Kaufmann2015) showed that 56.2% of the produced interrogative utterances in the Sgs database were wh-in-situ.
The complexity and definiteness of (wh)-noun phrases
The processing study of Mandarin Chinese that we mentioned above by Xiang et al. (Reference Xiang, Dillon, Wagers, Liu and Guo2013) and Xiang et al. (Reference Xiang, Wang and Cui2015) used only complex wh-phrases (i.e., which x phrases). Nonetheless, there is experimental evidence showing differences in the processing of complex and simplex wh-questions for languages such as English, Dutch and Italian in that the complex wh-questions take longer to read than simplex wh-questions (De Vincenzi Reference De Vincenzi1996; Kaan et al., Reference Kaan, Harris, Gibson and Holcomb2000; Donkers et al., Reference Donkers, Hoeks and Stowe2011). Other studies, however, provide opposite claims on the processing cost of complex wh-phrases, where these are facilitated (see Frazier & Clifton Reference Frazier and Clifton2002; Clifton et al., Reference Clifton, Fanselow and Frazier2006; Hofmeister et al., Reference Hofmeister, Jaeger, Sag, Arnon, Snider, Featherston and Sternefeld2007; Hofmeister & Sag, Reference Hofmeister and Sag2010).
In addition, the syntactic and semantic literature has made different claims as to which type of noninterrogative noun phrase is more comparable to the type of wh-phrase, even though previous processing research on in-situ wh-questions primarily focused on comparisons with declaratives with definite noun phrases. Evidence from the theoretical syntax and semantics literature (Cheng, Reference Cheng1991, Reference Cheng1994) shows that in Mandarin Chinese, simplex wh-words are closer to indefinite noun phrases, whereas complex wh-phrases are more akin to definite noun phrases (Giannakidou & Cheng, Reference Giannakidou and Cheng2006). Previous sentence processing studies showed differences in reading time depending on the referential nature of the noun phrase being tested (e.g., Warren & Gibson, Reference Warren and Gibson2002, Reference Warren and Gibson2005; Gordon, et al., Reference Gordon, Hendrick and Johnson2004; Kaan & Vasić, Reference Kaan and Vasić2004). These studies based their predictions on the Accessibility or Givenness Hierarchy (Gundel et al., Reference Gundel, Hedberg and Zacharski1993), which determines the accessibility of referents in the discourse and the relation between the type of noun phrase and the degree to which its antecedent is accessible, and they found that, in the absence of prior discourse, definite noun phrases take longer time to be read than their indefinite counterparts. This is because definite noun phrases require the reader to reconstruct their referent from zero, whereas indefinite noun phrases usually introduce new referents and do not require the reader to search for one.
Given the potential influence of noun phrase definiteness on parsing differences between wh-questions and declaratives, our experimental manipulation introduced two declarative types: one with definite and one with indefinite noun phrases in the wh-phrase position. To investigate the predictions outlined above and extend research on in-situ wh-question processing in Mandarin Chinese and the processing of complex and simplex wh-phrases, as well as studies on definite and indefinite noun phrases, we conducted four self-paced reading (SPR) experiments (see Jegerski, Reference Jegerski, Jegerski and VanPatten2014 for a summary description). SPR’s incremental processing methodology is well-suited for this investigation. The first two experiments in French compared the processing of in-situ questions with simplex object wh-phrases (qui “who”) and complex wh-phrases (quel N “which N”) with their declarative counterparts containing both definite and indefinite noun phrases. The second two experiments carried out the same comparisons in Mandarin Chinese. The next sections describe the experimental paradigm, design, and results.
Processing in-situ questions in French
Experiment 1: processing in-situ questions with simplex wh-phrases in French
As described above, research on the processing of in-situ wh-questions in French is scarce and researchers mainly concentrated on the prosodic characteristics of these questions or on their acceptability but not so much on their reading comprehension. The goal of Experiment 1 is to determine whether French in-situ questions with simplex wh-phrases (qui “who”) incur predicted processing costs at the disambiguation point in the absence of prosodic and contextual cues, compared to their declarative counterparts.
Method
Participants
Participants (n = 36, mean age = 22 years, 18 females) were all native speakers of French. They were recruited in two groups: one from the University of Nantes (France) (n = 30, mean age = 20 years, 16 females) and one from the Expat French community in the Leiden areaFootnote 6 (n= 6, mean age = 35 years, 2 females). Testing participants at different locations was done for practical reasons and to achieve the required statistical power. None of the participants suffered from dyslexia and all of them had normal or corrected vision. All participants provided informed consent and were monetarily compensated for their participation.
Materials
We compared object in-situ wh-questions with qui “who” in (8a), with indefinite noun phrases such as quelqu’un “someone” in (8b), and with monosyllabic (n = 9) and bisyllabic (n = 15) half masculine (n = 12) and half feminine (n = 12) proper names such as Marie in (8c).
An example of a stimuli set is given in (8).Footnote 7 The sentences were presented word-by-word incrementally from left-to-right.



The experiment consisted of 24 sets of three sentences distributed across three lists in a Latin Square design, which were combined with 72 filler sentences of similar length. Half of the fillers were questions and the other half declaratives.
The modifier of the subject noun phrase le braqueur “the robber” varied minimally in its length between two and three words. Most of the items (i.e., 20/24) contained two-word modifiers for the subject, as de banque “of bank” in le braqueur de banque “the bank robber” (8). The region dans sa fuite “on his escape” following the critical position given in bold (i.e., wh-word qui, indefinite quelqu’un, or proper name Marie) also differed minimally in length across items, ranging between three (in 15/24 sentences) to four words (in 9/24 sentences). This variation was kept so that the stimuli would sound as natural as possible to French speakers. All materials were checked for grammaticality and naturalness by a French native speaker.
Procedure
Participants signed an informed consent form before the experiment in compliance with the Ethics Code for linguistic research in the Faculty of Humanities at Leiden University. A self-paced-reading, word-by-word moving window task (Just et al., Reference Just, Carpenter and Wooley1982, Aaronson & Scarborough, Reference Aaronson and Scarborough1976) was conducted on a MacBook Pro laptop running the software Linger (Rhode, Reference Rhode2003) in a quiet room at the University of Nantes and in Leiden University. Each trial began with a group of dashes that corresponded to each word in the sentence. Therefore, participants could see the length of the sentence but not the words behind the dashes. Participants were asked to press the space bar to read the sentence word-by-word and to reply to the comprehension question that appeared immediately afterwards on a different screen by pressing the “F” (YES) or “J” (NO) buttons. These responses were indicated with a sticker above the corresponding keys. As participants pressed the space bar to read the sentences, each word was revealed individually and the previously read word disappeared. The punctuation mark at the end of the sentence, which unambiguously determined the interrogative or declarative nature of the sentence, appeared together with the last word of the sentence. This meant that in principle readers of French could not determine whether they were reading a question or a declarative until they reached this sentence final position. Therefore, the reason to choose a word-by-word moving window was to check what the predictions with respect to upcoming material were per word. To keep participants attentive, each sentence was followed by a yes/no comprehension question. The experiment lasted approximately 30 minutes. An example question for item (8a) (repeated here) is shown below.Footnote 8

Comprehension Question:
-
Est-ce un braqueur de bijouterie qui a blessé quelqu’un dans sa fuite ?
-
Was it a jewelry store robber who injured someone on his escape? (Answer : No)
Reading time data analysis
All trials (independently of whether the corresponding comprehension question was answered correctly or not) were included in the analysis. The average comprehension accuracy for the 36 participants was 96% (SD = 1.95%); thus, no participant was rejected on this basis. There was no significant difference in accuracy between declaratives (97.7%) and questions (96.9%), (χ2(1, N=859) = 0.277, Fischer’s p = 0.49).
The regions used for the analysis corresponded to single words, except for those cases where French clusters the determiner or preposition with the noun by means of an apostrophe (e.g., l’infirmière “the nurse,” d’une “of one”). The collected reading time data was inspected and outlier data points with reading times smaller than 150 ms or larger than 2000 ms were removed. The total number of discarded data points represented ∼1% of the complete data including both fillers and experimental sentences.
There is experimental evidence that word length and frequency impact reading time both in eye tracking (e.g., Kliegl et al., Reference Kliegl, Grabner, Rolfs and Engbert2004; Hyönä & Olson, Reference Hyönä and Olson1995) and in self-paced-reding studies (e.g., Bultena et al., Reference Bultena, Dijkstra and van Hell2014), with low frequency and longer words both shown to display increased fixation or reading durations, which is associated with a higher processing cost. To avoid possible confounding effects unrelated to our experimental manipulation and research questions, we addressed the impact of word length and word frequency (when relevant) of the critical regions in the obtained reading times by conducting an ad hoc analysis. First, we tackled the relation of reading time with word length in two ways: by calculating length-corrected residual reading times (RSRT) (Ferreira & Clifton, Reference Ferreira and Clifton1986) and by considering individual experimental items as a random factor in the mixed effects model analysis (Barr et al., Reference Barr, Levy, Scheepers and Tily2013). The reading time was residualized by computing a linear regression between the word length and reading time for each subject and then subtracting the predicted reading time from the observed reading time for each word. The resulting RSRT were used for all subsequent analyses. Second, to account for the possible effects of word frequency, the experimental dataset was expanded with the information contained in the Lexique database (New et al., Reference New, Pallier, Brysbaert and Ferrand2004) for French. This was done by extracting a frequency of use for each critical word and matching it to the relevant syntactic category. For inflected words, we used the frequency of the lemma to account for possible effects of word familiarity, whereas in the clusters containing an apostrophe (e.g., l’infirmière ‘the nurse’, d’une “of one”) we used the frequency of the noun (e.g., infirmière “nurse”).
We analyzed differences in the RSRT at two regions, - the site of the disambiguation (wh-question or NP qui/quelqu’n/Marie) and the following word to account for possible spillover effectsFootnote 9 (Vasishth, Reference Vasishth2006), using Linear Mixed Effects Regression (LMER; Baayen et al., Reference Baayen, Davidson and Bates2008) by means of the statistical computing language R (R Core Team, 2016) and the lm4 package (Bates et al., Reference Bates, Maechler, Bolker and Walker2015). The model included one fixed-effect factor, Condition, with three levels (wh-word, indefinite and Proper Name). In the region of the wh-question/NP (region 8 in Figure 1), as shown in (8), all the experimental items consist of the same pronouns qui or quelqu’n or a Proper Name, so we did not include in the model the word frequency for that region.Footnote 10 In the region immediately after the wh-site, in addition to Condition, a fixed effect for Word Frequency was considered based on the log-transformed, centered word frequency as extracted from Lexique. The maximal random effects structure justified by the model was considered (Barr et al. Reference Barr, Levy, Scheepers and Tily2013): variance introduced by subjects and items was modeled as random intercepts. In addition, we considered random slopes by subject for the factor Condition.

Figure 1. Mean RSRT per word for the comparison between in-situ questions with simplex wh-phrases (Qui), declaratives with indefinites (Quelqu’un) and proper names (Marie) in Experiment 1. Bars indicate the standard error per region.
The best model fitting the data was obtained by likelihood ratio test of models including and excluding the relevant effect and against a “null” model containing only an intercept parameter and the random effects structure. A follow-up analysis was performed when a significant effect of Condition was found to assess if a different behavior appears between the wh-in-situ question and the two types of declaratives.Footnote 11
Results
Figure 1 shows the average RSRT at the different regions of the experimental items against a sample sentence for reference. As shown in this figure, there are two regions that show significant effects. One is the critical region (i.e., wh-word “Qui”/ indefinite “Quelqu’n”/ Proper Name “Marie”) and the other is the immediately following region (i.e., the preposition “dans”). In both regions, the in-situ wh-question condition in (8a) is read significantly slower than its indefinite declarative counterpart in (8b). The definite declarative with a proper name in (8c) is only read significantly slower than the indefinite declarative in (8b) at the critical word region. There is a difference observed between the definite declarative in (8c) and the wh-question condition in (8a) at the region following the critical region (i.e., “dans”), where the question appears to be read slower, but this difference did not reach statistical significance.
Post hoc analysis at the critical region (region 8: Qui/Quelqu’n/Marie), presented in Table 1, confirmed both in-situ questions with simplex wh-phrases (qui) (D = 63.70 ms, χ2(1) = 12.37, p = 0.001) and declaratives with proper names (Marie) (D = 46.99 ms, χ2(1) = 8.43, p = 0.007) were read significantly slower than declaratives that contain indefinites (quelqu’un). No significant difference was found between the reading time of in-situ questions (qui) and declaratives that contain proper names (Marie). At the region immediately after the critical region (region 9: dans in the example in (8)), we observe a significant increase in reading time on the interrogative condition when compared to the declarative with indefinite pronoun (D = 31.72 ms, χ2(1) = 6.11, p = 0.04), but no significant difference in reading time when comparing the interrogative condition with the declarative with a proper name.
Table 1. Pairwise comparison for RSRT at the critical region “Qui/Quelqu’n/Marie” (region 8) and following word “dans” (region 9) in Experiment 1. P-values adjusted with the Holm method for multiple comparisons

* p < 0.05, ** p < 0.01, *** p < 0.001.
Table 2 provides a summary of the maximal fitted model for the wh/NP disambiguation site (region 8: Qui/Quelqu’n/Marie) and the region after (region 9), respectively.
Table 2. Model summary for RSRT at the critical region “Qui/Quelqu’n/Marie” (region 8) and following word “dans” (Region 9) in Experiment 1

* p < 0.05, ** p < 0.01, *** p < 0.001.
p-values calculated based on conditional F-tests with Kenward-Roger approximation.
Marginal R2 based on Nakagawa et al., (Reference Nakagawa, Johnson and Schielzeth2017).
The results above show the expected increased effort in processing the in-situ questions with a simplex wh-phrase, such as qui “who,” in French, when compared to declaratives with an indefinite NP. However, this effect is absent when contrasted with a declarative with a definite (proper name) NP. Furthermore, the same processing difficulty is observed between the two declaratives: ProperName conditions such as (8c) are also read slower than indefinite declaratives such as (8b). This difference between indefinite and proper names relative to in-situ questions with a simplex wh-phrase can be attributed to the greater integration difficulty of proper names compared to other definite or indefinite noun phrases (Ledoux et al., Reference Ledoux, Gordon, Camblin and Swaab2007; Camblin et al., Reference Camblin, Ledoux, Boudewyn, Gordon and Swaab2007). This finding aligns with the Accessibility Hierarchy (Gundel et al., Reference Gundel, Hedberg and Zacharski1993), which posits that indefinites require minimal contextual information for interpretation, while definites and proper names necessitate prior knowledge of the referent.
Experiment 2: processing in-situ questions with complex wh-phrases in French
The sentence processing literature showed that complex wh-phrases presented in isolation produce longer reading times than their simplex wh-phrase counterparts (see De Vincenzi, Reference De Vincenzi1996; Donkers et al., Reference Donkers, Hoeks and Stowe2011). In Experiment 2, we compared, again in the absence of prosodic and contextual information, the processing of in-situ questions with complex wh-phrases (e.g., quelle caissière “which cashier,” quel garçon “which boy”) with declaratives with a definite or indefinite noun phrase at the wh-phrase. Our prediction again, following the hypothesis described earlier on a bias for a declarative interpretation, is that in-situ questions with complex wh-phrases will be read slower than both declarative definite and declarative indefinite sentences in French, as shown by Adli (Reference Adli, Adli, García García and Kaufmann2015).
The aim of this experiment was to examine: first, if complex wh-phrases lead to comparable processing cost as the processing of in-situ questions with simplex wh-phrases, as examined in Experiment 1 and secondly, whether the contrast between complex wh-phrases and their declarative definite/indefinite counterparts will show similar effects in terms of timing and effect size as those observed in Experiment 1. The second research question is motivated by the syntactic and semantic debate regarding the comparability of different noun phrase and of wh-phrase types (Giannakidou & Cheng, Reference Giannakidou and Cheng2006), as discussed above.
Method
Participants
Participants (n = 36, mean age = 22 years, 25 females) were all native speakers of French. They were recruited in two groups: one from the University of Nantes (n = 31, mean age = 20 years, 22 females) and one from the Expat French community in the Leiden areaFootnote 12 (n = 5, mean age = 35 years, 3 female) respectively. As in Experiment 1, participants were tested at different locations for practical reasons and to achieve the statistical power we needed. Participants in this experiment were different from those in Experiment 1 to avoid repetition of the content in the experimental stimuli that differed minimally at the critical region. None of the participants suffered from dyslexia and all of them had normal or corrected vision. They all provided informed consent and were monetarily compensated for their participation.
Materials
A sample set of stimuli for Experiment 2 is given in (9).Footnote 13 Here, we compared object in-situ wh-questions formed with complex wh-phrases such as quelle caissière “which cashier” in (9a) with indefinite noun phrases such as une caissière “a cashier” in (9b), and with definite noun phrases such as la caissière “the cashier” in (9c).



Experiment 2 consisted of 24 sets of three sentences distributed across three lists in a Latin Square design, which were combined with 72 filler sentences of similar length. Half of the fillers were questions and half were declaratives. The fillers of Experiment 2 were a bit different from those of Experiment 1 to match the variation at the critical wh-phrase/noun phrase region.
Procedure
The Ethics Protocol and experimental procedure were the same as in Experiment 1. As in Experiment 1, we chose to present the stimuli in a strict word-by-word manner, rather than as constituents (e.g., which book). Both methodological approaches were previously used to present stimuli consisting of complex wh-phrases. In research employing the SPR methodology, the majority of studies adopted the approach of presenting them as constituents, where the wh-determiner (which) of the complex wh-phrase was displayed to readers together with the noun with which it formed the complex wh-phrase (e.g., quale ingegnere “which engineer” in Italian (De Vincenzi, Reference De Vincenzi1991, Reference De Vincenzi1996), welke bediende “which servant” in Dutch (Donkers et al., Reference Donkers, Hoeks and Stowe2011) and nǎxiē guānyuán “whichplural officials” in Mandarin Chinese (Xiang et al., Reference Xiang, Dillon, Wagers, Liu and Guo2013; Xiang et al., Reference Xiang, Wang and Cui2015). Other studies, such as Kaan et al.’s (Reference Kaan, Harris, Gibson and Holcomb2000) ERP study, adopted the word-by-word approach where the wh-determiner was presented separately from the wh-phrase noun in complex wh-phrases (e.g., which popstar). Since the main research question of our study was to run a direct comparison between the processing of declaratives with in-situ wh-questions, and we wanted to keep a close parallel between the way simplex wh-phrases and complex wh-phrases were presented and read by participants, we kept a word-by-word presentation for both in-situ questions with simplex and complex wh-phrases. This allowed the comparison between the incremental processing of the two kinds of questions with their declarative counterparts to be as closely matched as possible.
Reading time data analysis
The analysis procedure was as in Experiment 1 with the following exceptions: i) mixed-effect model considered the same predictors, interactions, and random effects but in this case Word Frequency was considered in all regions, ii) analysis was performed in three regions, the two corresponding to the complex wh-phrase, and the immediately posterior region “dans”. The average comprehension accuracy for all 36 participants was 95.3%, without significant difference in accuracy between declaratives (95.3%) and questions (95.5%), (χ2(1, N=856) = 0.0001, Fischer’s p = 1.0), so no participant was excluded from the analysis. As in Experiment 1, each word was considered a region, except for those words that contained a determiner (or preposition), plus noun clusters connected via an apostrophe. The same exclusion criteria for outliers were used as in Experiment 1 resulting in 1.5% of data discarded, including both filler and experimental items.Footnote 14
Results
Figure 2 shows the average RSRT per region for the three conditions described in (9) with a sample sentence. As it can be seen in the figure, there are two main effects. One occurs at the noun within the critical wh-phrase and noun phrase, where the in-situ questions with complex wh-phrases (“quelle caissière”) in (9a) are read significantly slower than the declaratives with indefinites (“une caissière”) in (9b) at the noun “caissière” (cashier). This effect is also present at the following region of the preposition “dans”. The declaratives with definites (“la caissière”) in (9c) are read faster at the noun “caissière” (cashier) and at the preposition “dans” than the interrogatives in (9c) but this difference is not significant.

Figure 2. Mean RSRT per word region for the comparison in Experiment 2 between in-situ questions with complex wh-phrases (“quelle caissière”) and declaratives with indefinites (“une caissière”) and definites (“la caissière”).
Table 3 contains the mixed-effect model for each of the three regions of interest. The LMER model fitted included random intercepts for both Subject and experimental Item and random slopes by Condition for Subject. A significant effect of the experimental manipulation was observed in regions 9 and 10 (caissiere “cashier” and dans “in” in example (9)). Word Frequency did not affect the observed results, and introducing the effect did not provide an improvement of the model fit for the observed reading time.
Table 3. Model summary for RSRT at the critical regions 8, 9 and 10 in (9) (“quelle”, “caissière,” and “dans”) in Experiment 2

* p < 0.05, ** p < 0.01, *** p < 0.001.
p-values calculated based on conditional F-tests with Kenward-Roger approximation.
Marginal R2 based on Nakagawa et al., (Reference Nakagawa, Johnson and Schielzeth2017).
As in Experiment 1, a follow-up analysis of the regions with significant differences (regions 9 and 10 = caissiere “cashier” and dans “in”) was performed. Post hoc pairwise comparisons (see Table 4 and Figure 3) confirmed that complex wh-in-situ questions (quelle caissière) are read significantly slower than indefinites (une caissière) in the noun part of the wh-phrase (caissière) (D = 47.30 ms, χ 2 (1) = 5.80, Pr(>χ 2 ) = 0.048). However, there is no significant difference between complex wh-in-situ questions and declarative sentences with definite noun phrases (la caissière) (D = 30.97, χ 2 (1) = 2.10, Pr(>χ 2 ) = 0.29). In addition, the effect appears to continue in the region after the wh-phrase (preposition “dans” in (9)), where again, only the indefinite declarative shows a significant difference from the interrogative condition (D = 55.85, χ 2 (1) = 12.84, Pr(>χ 2 ) = 0.001). Declaratives with definite names (as in 9c) do not show a significant effect compared with the interrogative (9a), but show, as in the case of questions with simplex wh-phrases, a larger reading time compared with indefinite declaratives (9b), although marginally significant (D = 29.56 ms, χ 2 (1) = 4.84, Pr(>χ 2 ) = 0.055).
Table 4. Pairwise comparison for RSRT at the noun region in the wh-complex phrase (region 9: “caissiere” in (9)) and following word (Region 10: dans in (9)) in Experiment 2

° p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001.
p-values adjusted with Holm method for multiple comparisons.

Figure 3. Mean RSRT in Experiment 2 at the wh-phrase word “caissiere” (left) and at the word after the wh-phrase “dans” (right). Black dot indicates the mean value, and bars indicate the 95% bootstrapped confidence interval of the mean.
The results show a similar pattern as in-situ questions with simplex wh-phrases and are consistent with the expected increased effort in the processing of in-situ questions with wh-phrases in French compared to declaratives. Again, this is only the case with indefinite declaratives. In the case of complex wh-phrases, the effect is observed not when the wh-word is encountered, at the determiner position, but rather at the wh-phrase boundary (noun). Further, it extends to the word immediately after the wh-phrase, in an apparent spillover effect.
Qualitative comparison of results from processing French in-situ wh-questions
In general, the results of Experiments 1 and 2 support the hypothesis that, in the absence of prosody and contextual information, in-situ wh-questions in French are not anticipated during parsing. Both in-situ questions with a simplex wh-phrase and in-situ questions with a complex wh-phrase are processed significantly slower than indefinite declaratives.
The finding that indefinites (e.g., quelqu’un “someone”, une caissière “a cashier”) in declaratives are read faster than in-situ wh-questions aligns with previous research (Warren & Gibson, Reference Warren and Gibson2002; Warren & Gibson, Reference Warren and Gibson2005, Gordon et al, Reference Gordon, Hendrick and Johnson2004, Kaan & Vasić, Reference Kaan and Vasić2004) resulting in longer reading times for definite noun phrases in the absence of previous discourse. This pattern supports the Accessibility Hypothesis (Gundel et al., Reference Gundel, Hedberg and Zacharski1993), which posits that indefinites require less processing effort due to their lower referential demands. In our case, experimental sentences were presented in isolation and no prior discourse was introduced; therefore, the processing of a definite noun phrase or proper name (Ledoux et al., Reference Ledoux, Gordon, Camblin and Swaab2007; Camblin et al., Reference Camblin, Ledoux, Boudewyn, Gordon and Swaab2007) can be as costly (and therefore result in longer reading times) as the processing of an in-situ wh-phrase.
With respect to the comparison between how in-situ simplex and complex wh-phrases behave, results show a timing difference: for simplex wh-phrases, effects occur already at the position of the wh-phrase qui “who”, and at the word immediately after, whereas in complex wh-phrases, effects are apparent at the wh-phrase boundary (noun) and at the word immediately after this noun. In complex wh-phrases, readers might wait for the wh-phrase boundary completion to build an interpretation, considering that the determiner quel(le) “which” in French is not informative on its own. The observed “spill-over effect” to the next word can reflect an open processing of the wh-phrase as well, since complex wh-phrases can be followed by a postnominal modifier (e.g., quelle caissière de supermarché “which supermarket cashier” (lit. “which cashier of supermarket”) which is not the case for simplex wh-phrases (*qui de supermarché “*who of supermarket”).
Experiment 2 on French data indicates that parsing comparisons between wh-questions and declaratives require a consideration of semantic factors beyond in-situ wh-question processing, such as the nature of the referential elements, which may influence the processing of declaratives.
Processing in-situ questions in Mandarin Chinese
Next, we present two experiments on in-situ questions containing simplex and complex wh-phrases in Mandarin Chinese, a language that always applies the in-situ question strategy formation, with a similar setup of Experiments 1 and 2 in French (considering what the cross-linguistic differences allowed) to compare qualitatively the results of French in-situ wh-questions processing with those of Mandarin Chinese.
Experiment 3: Processing in-situ questions with simplex wh-phrases in Mandarin Chinese
In Experiment 3, we compared in-situ questions with simplex wh-phrases (shéi “who”) with two types of declaratives, one containing a definite object, and the other an indefinite object.
Method
Participants
Participants (n = 36, mean age = 20 years, 16 females) were all native speakers of Mandarin Chinese and were recruited from Tsinghua University in Beijing, China. None of the participants suffered from dyslexia and all of them had normal or corrected vision. All participants provided informed consent and were compensated monetarily for their participation according to the local standards.
Materials
In this experiment, we compared in-situ wh-questions with the simplex wh-phrase shéi “who” in object position, as in (10a), with indefinite noun phrases such as rén “person/someone” in (10b), and with disyllabic proper names such as XiǎozhāngFootnote 15 in (10c). We constructed sentences that contained an intensional verb and without perfective marker –le to constrain the reading of bare nouns such as rén “person/someone” to a nonspecific indefinite reading.Footnote 16 Moreover, the intensional context further allowed two regions following the in-situ wh-phrase to grant observation of effects that could potentially occur after the critical region. An example of a set of stimuli is given in (10).Footnote 17



Experiment 3 consisted of 24 sets of three sentences distributed across three lists in a Latin Square design, which were combined with 72 filler sentences of similar length. Half of the fillers were questions and half were declaratives.
Procedure
The Ethics Protocol was as in Experiment 1. A self-paced-reading, word-by-word (where each word consisted of one to three characters) moving window task (Just et al, Reference Just, Carpenter and Wooley1982; Aaronson & Scarborough, Reference Aaronson and Scarborough1976) was conducted on a MacBook Pro laptop running the software Linger (Rhode, Reference Rhode2003) in a quiet room in Tsinghua University. Each trial began with a group of dashes that corresponded to each word in the sentence. Therefore, participants could see the length of the sentence but not the words behind the dashes. Participants were asked to press the space bar to read the sentence word-by-word and to reply to the comprehension question that appeared immediately afterwards in a different screen by pressing the “F” (YES) or “J” (NO) buttons. These responses were indicated with a sticker above the corresponding keys. As participants pressed the space bar to read the sentences, each word was revealed individually and the previously read word disappeared. The punctuation mark at the end of the sentence, which unambiguously determined the interrogative or declarative nature of the sentence, appeared together with the last word of the sentence. As it is standard in Chinese, no spaces were provided in between characters, and readers read only one-to-two-character words at a time incrementally on the screen as they pressed the space bar. To keep participants attentive, each sentence was followed by a yes/no comprehension question. The experiment lasted approximately 30 minutes.
Reading time analysis
The average comprehension accuracy for the 36 participants was 96.3%, with no difference in accuracy between questions (97.6%) and declaratives (95.7%) (χ2(1, N = 864) = 1.464, Fischer’s p = 0.23). Therefore, all trials were included in the analysis, regardless of comprehension question accuracy. All regions used for the analysis corresponded to single words, which ranged from one to two characters.Footnote 18
Following previous studies on Mandarin (Wu et al., Reference Wu, Kaiser and Andersen2012; Xiang, et al. Reference Xiang, Wang and Cui2015; Li et al., Reference Li, Li, Xie, Chang, McGowan, Wang and Paterson2019) and other East Asian languages (e.g., Kwon & Sturt, Reference Kwon and Sturt2013 for Korean; Witzel & Witzel, Reference Witzel and Witzel2016 for Japanese), we log-transformed the individual raw reading times (RWRTs) in each region to correct for the skewness of the distribution.Footnote 19 As in Experiments 1 and 2 for French, data points with RWRT shorter than 100 ms or larger than 2000 ms were excluded from the analysis, affecting <1% data. The resulting log-RTs at each region were analyzed using linear mixed effects model analysis (LMER; Baayen et al, Reference Baayen, Davidson and Bates2008) with Sentence Type (declarative vs question) as fixed-effect factor. In addition, random intercepts for both Item and Subject were included as well as slopes by Condition for Subject. The effect of Word Frequency on the reading time was accounted for by introducing in the model fit a fixed factor logW which contained the log-frequency of the specific word as extracted from the Chinese database on film subtitles SUBTLEX-CH (Cai & Brysbaert, Reference Cai and Brysbaert2010). The variable introduced represents the logarithm of the count of appearances of the selected word in a database. As for the French wh-simplex experiment, in the critical region of the wh-question for Mandarin, all items have the same pronouns (谁/shéi in 10a, 人/rén in 10b) or proper names (as in (10c)), the latter not found in the database, therefore a Word Frequency factor is not included in the statistical analysis for that region. The predictor logW was centered and included in the LMER model as a fixed factor in the analysis for other regions with a variation of words across items. Analyses were performed as in the French experiments at two sites: the disambiguation region of the wh-question (region 5: in (10) 谁/shéi, 人/rén, 小张/ Xiǎozhāng) and in the immediately posterior region (region 5: in (10) 解决/ jiějué). As in Experiments 1 and 2, analyses were performed using the statistical computing language R (R Core Team, 2016) and the lm4 package (Bates et al., Reference Bates, Maechler, Bolker and Walker2015).Footnote 20
Results
LMER model in Table 5 shows the final best-fitting model. A fixed effect of Condition was retained in the final model and random intercepts by Subject and Item. Random slopes did not significantly improve the fit.
Table 5. Model summary for RSRT at the critical region 5 (shie/ren/Xiaozhang) and the immediately posterior region 6 (jiejue) in (10) in Experiment 3

* p < 0.05, ** p < 0.01, *** p < 0.001.
Figure 4 shows the average LogRT at each region for all three experimental conditions in (10) against a sample sentence for reference. As shown in this figure, and corroborated by the post hoc analysis in Table 6, there are two regions that show significant effects. One is the critical region (i.e., 谁/shei, 人/ren, 小张/ Xiǎozhāng) and the other is the immediately following region (i.e., 解决/ jiějué “to solve”). In the region of the in-situ wh-question condition in (10a), the in-situ wh-phrase (谁/shei) is read significantly faster than the definite declarative counterpart (D = –0.023, χ 2 (1) = 7.00, Pr(>χ 2 ) = 0.02) with a proper name in (10c). No other comparisons reached significance at that region. In the immediately following region (i.e., 解决/ jiějué “to solve”) however, both the in-situ question with the simplex wh-phrase (D = 0.031, χ 2 (1) = 10.27, Pr(>χ 2 ) = 0.004) and the definite declarative with a proper name (D = 0.029, χ 2 (1) = 9.07, Pr(>χ 2 ) = 0.005) are read significantly slower than the indefinite declarative.

Figure 4. Mean Log Reading times per word/region for the comparison between in-situ questions with simplex wh-phrases with shéi and declaratives in Experiment 3.
Table 6. Pairwise comparison for RSRT at the noun region in the wh-simplex phrase (region 5: “shie/ren/Xiaozhang” in (10)) and following word (Region 6: jiejue in (10)) in Experiment 3

* p < 0.05, ** p < 0.01, *** p < 0.001.
p-values adjusted with Holm method for multiple comparisons.
In other words, for in-situ questions with a simplex wh-phrase (shéi “who”) in Mandarin Chinese, the expected increased reading time of questions compared to declaratives is only observed with respect to definite declaratives conditions (the proper name Xiǎozhāng) at the region after the wh-word. The unexpected rapid processing of the interrogative compared with the other conditions at the wh-phrase position could be attributed to the difference in length between the conditions. Unlike the region that immediately follows the wh-phrase (i.e., 解决/ jiějué “to solve”), where all items have equal duration, the wh-question region presents variability: proper names consist of two characters (i.e., 小张/ Xiǎozhāng), while wh-phrase and indefinite pronouns comprise only one (i.e., 谁/shei, 人/ren). Although a small variation, word length in characters in Chinese has been shown to affect the reading time in measures of fixation duration in eye-tracking studies (Zang et al., Reference Zang, Fu, Bai, Yan and Liversedge2018).
Experiment 4: Processing in-situ questions with complex wh-phrases in Mandarin Chinese
In Experiment 4, we compare in-situ questions with complex wh-phrases in Mandarin Chinese (nǎgè tóngxué “which classmate”) with declaratives that contained a non-interrogative noun phrase (e.g., definite: nàgè tóngxué “the classmate”; indefinite: yígè tóngxué “a classmate”).
Method
Participants
Participants (n = 54, mean age = 27 years, 31 females) were all native speakers of Mandarin Chinese and were recruited from the pool of MA and PhD students from China studying at Leiden University. Participants were recruited locally instead of in China for practical reasons. None of the participants suffered from dyslexia and all of them had normal or corrected vision. All participants provided informed consent and were monetarily compensated according to the local standards.
Materials
In this experiment, we compared object in-situ wh-questions with wh-phrases such as nǎgè tóngxué “which classmate” in (11a), with declaratives that contained indefinite nouns phrases such as yígè tóngxué “a classmate” in (11b), and declaratives that contained definite noun phrases such as nàgè tóngxué “the classmate” in (11c). As in Experiment 3 on in-situ questions simplex wh-phrases in Mandarin Chinese, we tested sentences that had an intensional verb and no perfective marker -le. (11) provides a set of sample stimuli.Footnote 21



Experiment 4 consisted of 24 sets of three sentences distributed across three lists in a Latin Square design, which were combined with 72 filler sentences of similar length. Half of the fillers were questions and half were declaratives.
Procedure
The Ethics Protocol and procedure were like in Experiment 3, except that it was tested at Leiden University in the Netherlands.
Reading time analysis
The analysis followed the same as Experiment 3. Starting from a maximal model, the simplest best-fitting model explaining the data was retained. This model contained the same predictors, interactions, and random effects as those used for Experiment 3. The average comprehension accuracy for the 54 participants was 95.4%, with a similar high rating on both questions (94.1%) and declaratives (96.7%), although with a significantly slightly higher accuracy level in declaratives (χ2(1, N = 1280) = 4.228, Fischer’s p = 0.04). As in Experiment 2, we analyzed the regions corresponding to the wh-phrase/NP (region 5, nǎgè “which”/yígè ‘a’/nàgè “the/that” and 6, tóngxué “classmate”) and the immediately posterior word (region 7, jiějué “to solve”).Footnote 22
Results
Figure 5 shows the average LogRT at each region for all three experimental conditions in (11) against a sample sentence for reference. As shown in this figure and corroborated by the statistical analysis (Table 7), there are two regions that show significant effects. One is the determiner in the wh-phrase/NP in the critical region (i.e., nǎgè “which”/yígè “a”/nàgè “the/that”) and the other is the immediately following noun region (i.e., tóngxué “classmate”). The word immediately following the wh-phrase/NP (jiějué “to solve”) does not show any differences. In the region at the start of the wh-phrase/NP, the wh-in-situ condition (i.e., nǎgè “which”) in (11a) is read significantly slower than the indefinite declarative (yígè “a”) in (11b) (D = 0.023, χ 2 (1) = 7.31, Pr(>χ 2 ) < 0.05). In the immediately following region (i.e., tóngxué “classmate”), the wh-in-situ condition (i.e., nǎgè “which”) in (11a) is again read significantly slower than the indefinite declarative in (11b) (D = 0.055, χ 2 (1) = 30.11, Pr(>χ 2 ) < 0.001) and the definite declarative (nàgè “the/that”) in (11c) (D = 0.019, χ 2 (1) = 4.47, Pr(>χ 2 ) < 0.05). Further, the indefinite declaratives in (11b) were read significantly faster than the definite declaratives in (11c) (D = –0.037, χ 2 (1) = 17.47, Pr(>χ 2 ) < 0.01) at this noun position.

Figure 5. Mean log reading times per word/region for the comparison between in-situ questions with complex wh-phrases (nǎgè tóngxué “which classmate”), declaratives with an indefinite phrase (yígè tóngxué “a classmate”) and declaratives with a definite phrase (nàgè tóngxué “that classmate”) in Experiment 4.
Table 7. Pairwise comparison for RSRT at the wh-complex phrase regions (region 5: nǎgè/yígè/nàgè and region 6 tóngxué: in (11)) and following word (Region 7: jiějué in (11)) in Experiment 4

* p < 0.05, **, p < 0.01, *** p < 0.001.
p-values adjusted with Holm method for multiple comparisons.
Table 8 shows the best-fitting LMER model with a retained fixed effect of Condition.
Table 8. Model summary for RSRT at the wh-complex phrase regions (region 5: nǎgè/yígè/nàgè and region 6 tóngxué: in (11)) and following word (Region 7: jiějué in (11)) in Experiment 4

* p < 0.05, ** p < 0.01, *** p < 0.001.
The results, therefore, show that in-situ questions with a complex wh-phrase in Mandarin Chinese are read significantly slower than their indefinite declarative counterparts already at the wh-determiner position with the slowdown of questions with respect to indefinite declaratives carrying over to the wh-phrase/NP boundary, the noun (i.e., tóngxué “classmate” in (11)).
Qualitative comparison of results on processing Mandarin Chinese in-situ wh-questions
Mandarin in-situ wh-questions, both simplex (shéi “who”) and complex (nǎgè tóngxué “which classmate”), exhibit slower processing than declaratives. However, the timing of these processing costs differs between wh-phrase types. Post hoc analyses revealed significantly faster reading times for declaratives with indefinite noun phrases (i.e., rén “person” and yígè tóngxué “a classmate”) compared to those with definite noun phrases (i.e., Xiǎozhāng and nàgè tóngxué “the/that classmate”) when contrasted with wh-questions.
As outlined in the initial predictions in the introduction section, question clause-type interpretation is triggered upon encountering a wh-phrase, not prior. This is mainly observed in Mandarin Chinese for complex wh-questions. The evidence is not so clear for simplex wh-questions, where we observe no significant effect at the wh-word. Instead, we observe a delayed effect one word later.
When the definiteness/indefiniteness of declaratives is considered, the slowdown is mainly observed between in-situ wh-questions and declaratives containing indefinite noun phrases. Declaratives containing definite noun phrases exhibit a distinct processing pattern.
Qualitative comparison of results of French and Mandarin Chinese
Both French and Mandarin Chinese show similar patterns where in-situ wh-questions are processed slower than indefinite declarative counterparts. The additional processing effort on the wh-in-situ sentences confirms the hypothesis that, in the absence of overt cues, the parser commits to a particular interpretation of the sentence (i.e., declarative) and only considers the interrogative interpretation when there is overt evidence for it (i.e., the wh-word). As outlined in the introduction section, the slowdown in reading time observed could be attributed to a syntactic integration process when an alternative structure needs to be either reactivated in parallel processing accounts (Jackendoff & Audring, Reference Jackendoff and Audring2020; Huettig et al., Reference Huettig, Audring and Jackendoff2022), retrieved in activation-based retrieval accounts (Van Dijk & Lewis, Reference Van Dyke and Lewis2003; Lewis & Vasishth, Reference Lewis and Vasishth2005), or re-analyzed as in “classic” processing accounts reflecting the reanalysis (Fodor & Ferreira, Reference Fodor and Ferreira1998) of the clause type of the sentence from a declarative to an interrogative interpretation, with the extra process of integrating the scope position (Spec CP) to associate it with the wh-word (see the work by Xiang et al., Reference Xiang, Wang and Cui2015 and Lo & Brennan, Reference Lo and Brennan2021 addressing specifically this process).
The present findings, demonstrating processing costs for in-situ wh-questions in both French and Mandarin Chinese, suggest that the availability of different wh-movement strategies is not a primary determinant of in-situ processing difficulty, as optional wh-in-situ languages like French also show processing difficulties. A direct comparison of the relative size of the effects is not possible cross-linguistically however, so to investigate if a modulation of the effect strength is present due to the language wh-movement strategy might not be possible using behavioral methods.
Although the processing pattern is common in both languages, French and Mandarin Chinese differ in the timing of the processing of in-situ wh-questions, depending on the nature of the wh-phrase. For simplex wh-questions, while both French and Mandaring speakers exhibit slower reading times relative to indefinite declaratives, the onset of this processing difficulty differs: it emerges at the wh-phrase position in French but one word later in Mandarin. As discussed earlier, it is not clear why at the wh-phrase position Mandarin Chinese readers show a facilitation effect of interrogatives relative to definite declaratives and answering this would require further research. For complex wh-phrases, Mandarin speakers show slowdown effects as soon as the wh-determiner is encountered whereas French speakers do not show these effects until the noun within the wh-phrase has been processed. Complex wh-phrases in French exhibit delayed processing relative to simplex wh-phrases, with effects emerging in the first post-wh-phrase region. The explanation we provide above for this delay is related to the possibility of having postnominal modification in French, which is not available in Mandarin Chinese. Further, the nature of the wh-determiner “which” (i.e., quel(le) vs. nǎgè) in each language may allow for more ambiguity in French than in Mandarin Chinese, where differences are observed at the onset of the wh-phrase.
Finally, the most pronounced difference between wh-in-situ questions and declaratives emerged in conditions involving indefinite noun phrases. Declaratives containing definite elements, such as proper names or definite noun phrases, appear to incur a processing load similar to that of wh-questions in both languages. Consistent with previous research (Warren & Gibson, Reference Warren and Gibson2002; Yen, Reference Yen2007), proper names and definite noun phrases, lacking antecedent context, are generally more demanding to integrate than other referential noun phrases. This integration difficulty likely contributes to the observed processing costs for declaratives containing these elements, aligning them with the processing challenges of wh-questions.
One limitation of the presented work is that we did not have a within-subject configuration for the experiments conducted in each of the languages, therefore we could not compare statistically the simplex and complex in-situ wh-questions. This was done to avoid the discussion often observed in the literature of how valid it is to compare a simplex and a complex wh-phrase directly when the number of words read is different (e.g., one for “who” vs two for “which woman”). We discussed how differences in syllable length of our critical words for the four experiments complicated the comparison, therefore, including one more difference would have only complicated the interpretation of the results further.
Conclusion
In this study, we examined the word-by-word reading of in-situ questions that contained both simplex and complex wh-phrases in French and Mandarin and compared them with declaratives that contained definite/indefinite noun phrases. Our results showed that the parser assumes a declarative interpretation when reading these sentences incrementally, independently of the question formation strategy/strategies that the language has. Nevertheless, the nature of the wh-phrase (simplex or complex) and the declaratives’ definiteness must be considered in the interpretation of the processing difficulty of wh-in-situ in the absence of contextual or prosodic information, as several cognitive processes are acting concurrently, to avoid incorrect conclusions on the processes observed.
Replication package
The supplementary materials including stimuli, data, and analysis code are available in https://osf.io/2cwqn/.
Acknowledgments
This work was supported by The Netherlands Organisation for Scientific Research (NWO) [project number 360-70-480] to the first author and by the Chinese Scholarship Council to the second author. We would like to thank Hamida Demirdache, Xiaolu Yang and Xose Lorenzo for providing us with the venues to test participants in France, China and The Netherlands and our research assistants Sylvie Cuchet, Lucas Tual and Juliette Angot for the help provided in conducting the French studies in France and in constructing the French stimuli. Finally, we would like to thank the editor and two anonymous reviewers for their insightful comments during the revision process.
Author contribution statement
Leticia Pablos Robles: Conceptualization, Methodology, Data Acquisition, Formal analysis, Data curation, Writing—original draft. Yang Yang: Conceptualization, Methodology, Data Acquisition, Formal analysis, Writing—review & editing. Jenny Doetjes: Conceptualization, Writing—review & editing, Funding acquisition. Lisa Cheng: Conceptualization, Writing—review & editing, Funding acquisition.