1. Introduction
Describing events in everyday life is an essential form of human language use, and one indispensable event type is describing motion from one place to another. Motion is a fundamental phenomenon in human experience (e.g. Miller & Johnson-Laird Reference Miller and Johnson-Laird1976, Blomberg Reference Blomberg2014), and accordingly, the cross-linguistic differences in the expressions used to describe motion have raised great interest in cognitively and typologically inclined linguistics. Expressing a spatial relation, either static or dynamic, requires the choice of a certain perspective, a ‘process of abstracting from the visual scene’, as Levelt (Reference Levelt, Paul, Peterson, Lynn and Garrett1996:78) asserts. A widespread and established way to investigate the choices speakers make is to assume a set of frames of reference (henceforth FoRs).
The idea of FoRs, crosscutting several fields of study, is based on our everyday experiences in physical reality: traversal of an object in physical space is necessarily judged in relation to something else, some background, reference point, or viewpoint. This is captured in the definition of translocation, a central concept in the framework Holistic Spatial Semantics (henceforth HSS; see e.g. Zlatev Reference Zlatev, Hubert and Dirk2007, Zlatev, David & Blomberg Reference Zlatev, David, Blomberg, Vyvyan and Paul2010, Blomberg Reference Blomberg2014) applied in this article:Footnote 1 ‘the continuous change of an object’s average position according to a spatial frame of reference’ (Zlatev et al. Reference Zlatev, David, Blomberg, Vyvyan and Paul2010:394). As can be seen from this definition, FoRs are centrally involved in translocation, the main research subject of motion typology (e.g. Talmy Reference Talmy2000). To present a simple example, the sentence John is running describes motion, but to represent a translocative motion situation,Footnote 2 it would have to include a spatial specification with at least one FoR: for example, John is running to the forest (object-centred)/this way (viewpoint-centred)/north (geocentric). To adapt the analysis better to the motion situations, the FoRs applied in this article have a more topological-directional nature than projective in the sense of Levinson (e.g. Reference Levinson2003). The differences are further explicated in Section 2.1.
This article focuses on the variation that arises in the strategies of spatial reference within a single language and within data collected from a rather uniform group of participants. Variation with respect to the factors generally acknowledged in the literature – language, environment, and culture – is minimised. As Palmer et al. (Reference Palmer, Lum, Schlossberg and Gaby2017) state, these three factors can be expected to function together when determining FoRs. This article, however, poses the question of whether there are other potential explanations for variation – this time, in the context of motion. The factors explored relate to individual differences as well as the semantics of the motion situation and the language-specific resources for encoding motion; they are also most likely to have effects in languages in general rather than just in Finnish as discussed in this article. On the other hand, the degree of within-language variation varies between languages, as Montero-Melis (Reference Montero-Melis2021) shows. In his study, the Spanish event descriptions show considerably more individual variation than the Swedish ones.
I analyse motion descriptions collected with a set of visually presented motion scenes from 50 Finnish-speaking informants to see where the variation lies and the factors both language-internal and language-external that can explain it. The central questions are as follows: How much variation is there in the use of FoRs in Finnish motion descriptions? How extensive is the variation between individual speakers of the same language? Is variation connected to certain types of motion situations?
I present the analysis from three different perspectives: (i) the way individual speakers produce spatial reference through different strategies, (ii) variation related to the stimuli, and (iii) the distribution of FoRs in the elicited motion descriptions in Finnish and in the corresponding descriptions of three other languages. Earlier research on spatial FoRs in Finnish has focused on static relations (e.g. Ojutkangas Reference Ojutkangas2005) and location within a moving container (Teeri-Niknammoghadam, Kelloniemi & Huumo Reference Teeri-Niknammoghadam, Kelloniemi and Huumo2020).
Section 2 briefly introduces the reader to the extensive theoretical discussion on spatial FoRs in linguistics and related fields and discusses factors that have been or can be connected to the choice of FoR. It also provides the reader with background information on the central resources for expressing translocation in Finnish. Section 3 describes the elicitation method and the data acquired with it. Sections 4–6 examine the variation in the data from different perspectives: Section 4 focuses on variation between individuals, and Section 5 on variation within the stimuli. Section 6 frames the discussion on Finnish with a cross-linguistic comparison. Section 7 concludes the discussion.
2. Background
2.1 Frames of reference
Frames of reference have been classified in different ways, but what is generally seen as crucial is the difference between egocentric (viewpoint-centred or subject-based) and allocentric (object-based) FoRs (Bohnemeyer et al. Reference Bohnemeyer, Donelson, Tucker, Elena, Alejandra Capistrán, Alyson and Néstor Hernández2014; Denis Reference Denis2018:61–63). Allocentric FoRs can be further divided into an object-centred FoR and a geocentric FoR (with varying terminology and definitions in different models; see e.g. Bender & Beller Reference Bender and Sieghard2014:344). Levinson’s (e.g. Reference Levinson2003) intrinsic, relative, and absolute FoRs are widely used in the analysis of static spatial descriptions. They can be applied to motion (see Tenbrink Reference Tenbrink2011:708–714), but they only cover a somewhat limited proportion of motion situations (see Levinson Reference Levinson2003:95–97).
This article applies the notion of FoR to the motion context following the HSS framework (e.g. Zlatev et al. Reference Zlatev, David, Blomberg, Vyvyan and Paul2010, Blomberg Reference Blomberg2014), which defines the types as follows: The viewpoint-centred (VC) FoR involves a reference to the viewpoint of the speaker (He is in front of the bush), the addressee (He is in front of the bush from your point of view), or another person in the situation (He is in front of the bush from John’s point of view) (Zlatev Reference Zlatev, Hubert and Dirk2007:329). This includes both the relative type (e.g. to the left of the house, see Levinson Reference Levinson2003) and the deictic type (e.g. come here). In my view, any reference to the viewpoint of the speaker is a sign of a specific perspective on the stimulus: the speaker is also evaluating the motion situation instead of just acting as an external narrator. It should be noted that in the analysis of Finnish, I judge overt references to the viewer (e.g. towards me) as primarily deictic and treat references to the camera (e.g. towards the camera) as analogous to these as the location of the camera coincides with the speaker’s viewpoint. However, in the cross-linguistic analysis in Section 6 these instances are treated as object-centred, for the sake of comparability with the results of Blomberg (Reference Blomberg2014:64) who defines these instances as object-centred based on the idea of objective construal (Langacker Reference Langacker1990).Footnote 3
In the object-centred (OC) FoR, the reference is made to a LandmarkFootnote 4 that can be of two types: either projective (corresponding to Levinson’s intrinsic FoR), as in The car is parked in front of the building, or non-projective (topological), as in She went to school (Zlatev Reference Zlatev, Hubert and Dirk2007:329). Levinson (Reference Levinson2003:71–72) excludes expressions of topological relations from FoRs but admits that many of them include information about axial properties or the intrinsic features of the landmark. For example, coincidence (e.g. at) is non-projective but laterality (e.g. next to) is projective (Frawley Reference Frawley1992:255). Blomberg (Reference Blomberg2014:64) notes that the intrinsic properties of objects in a spatial configuration can be either morphologic, as in the case of intrinsic fronts and backs, or functional, such as the fact that other objects can be placed inside hollow objects. Overall, topological relations contain central spatial information that is worth exploring in a framework that defines FoRs as the basis of translocation.
The geocentric (GC) FoR uses fixed geo-cardinal bearings to locate the Figure, as in She went north. In HSS, the vertical dimensions up and down are also included in GC (Zlatev Reference Zlatev, Hubert and Dirk2007). In principle, all three FoRs are distinguishable on the vertical axis but it is difficult to separate them out from each other. A central reason for the coincidence of the frames is gravity as the fundamental basis for the upright position of humans and objects. Carlson-Radvansky & Irwin (Reference Carlson-Radvansky and Irwin1993:239–240) showed the domination of the geocentric FoR in connection to the English preposition above in a carefully planned test setting in which they managed to dissociate the three FoRs.
FoRs are incommensurable: a description of a tree being to the left of a house tells us nothing about the location of the tree in relation to the north-south axis, for instance (Levinson Reference Levinson2003). Nonetheless, when defined more broadly, FoRs can appear together in complex utterances and represent the same situation from complementary perspectives, as in They came down into the valley, in which the verb come encodes VC, the directional down GC and the expression into the valley OC.
2.2 Factors affecting the choice of FoR
The factors that determine the choice of FoR have been widely discussed. The effect of language on the use of FoRs, also in non-linguistic tasks, has been shown in various studies (e.g. Pederson et al. Reference Pederson, Danziger, Wilkins, Levinson, Kita and Senft1998, Levinson Reference Levinson2003). Other studies expand the discussion towards additional factors causing variation in the use of FoRs, such as different environmental, demographic, and situational factors. Some authors (e.g. Li & Gleitman Reference Li and Lila2002) emphasise the role of the environment, while others seek to combine the effect of language and other factors (e.g. Dasen & Mishra Reference Dasen and Ramesh2010, Bohnemeyer et al. Reference Bohnemeyer, Donelson, Tucker, Elena, Alejandra Capistrán, Alyson and Néstor Hernández2014, Palmer et al. Reference Palmer, Lum, Schlossberg and Gaby2017).
The view supported in much of the recent research is that there are various factors that function together. Palmer et al. (Reference Palmer, Lum, Schlossberg and Gaby2017:488) emphasise the complexity of the effects, stating: ‘human spatial behaviour cannot be understood by appeal solely to language or culture or environment alone’. This article builds on the complexity discovered in recent research and takes as its starting point a situation where the most central factors posited in the literature have been controlled for. The goal is to assess additional factors that possibly affect the choice of FoRs and cause variation in the descriptions of motion situations.
In the current set-up (see Section 3), the informants form a rather uniform socio-cultural group of formally educated adults speaking the same language and living in urban surroundings. The elicitation task controlled the situations to be described: the informants watched and described the same stimuli. In the following, I present a set of additional factors that are considered in relation to the FoRs in this article.
Despite watching the same stimuli, individual variation arises in the ways people perceive the situations and in the strategies they use to encode the central content of certain stimuli (see e.g. Tversky’s (Reference Tversky1991) survey and route descriptions). In the literature on expressing motion, individual variation is under investigated (however, see Montero-Melis et al. Reference Montero-Melis, Eisenbeiss, Narasimhan, Ibarretxe-Antuñano, Kita, Kopecka, Lüpke, Nikitina, Tragel, Florian Jaeger and Bohnemeyer2017, Montero-Melis Reference Montero-Melis2021). As variation is known to be related to demographic factors as well as to areal differences in language use (Berthele Reference Berthele2013), an interesting question is also the amount of variation that occurs beyond such factors – such as the individual preferences and choices in language use within a demographically rather homogeneous group of subjects. Wide variation in spatial cognition, such as navigation skills, is also a well attested fact (e.g. Wolbers & Hegarty Reference Wolbers and Mary2010, Meneghetti, Pazzaglia & De Beni Reference Meneghetti, Pazzaglia and De Beni2011), and such differences may influence an individual’s ability to describe spatial scenes accurately.
This article investigates FoRs in the context of motion descriptions. Thus, the choice of FoR is assumed to be affected by the situational context and semantic elements of the motion situation as well as the linguistic resources used to encode them. Variation in describing motion situations has been identified on different levels: two or three motion event expression types have been, for example, claimed to account for the substantial cross-linguistic differences in the expression of Path (e.g. Talmy Reference Talmy and Timothy1985, Reference Talmy2000). In Talmy’s (Reference Talmy2000) model, verb-framed languages usually express Path in the main verb (e.g. Spanish La mujer entra a la casa ‘The woman goes into the house’) and satellite-framed languages typically express Path in the so-called satellite elements, such as adverbs or verbal prefixes (e.g. Swedish Kvinnan går in i huset ‘The woman goes into the house’). Recently, the focus has concentrated more on inter-linguistic variation within assumed language types and on language-internal variation (e.g. Ibarretxe-Antuñano Reference Ibarretxe-Antuñano2009, Goschler & Stefanowitsch Reference Goschler and Stefanowitsch2013, Fagard, Stosic & Cerruti Reference Fagard, Stosic and Cerruti2017, Lewandowski Reference Lewandowski2021).
The specific resources of a given language may also direct the use of FoRs and emphasise the role of some FoRs compared to those in other languages, even those that are typologically close. Variation in linguistic resources is thus expected to produce variation in utterances (e.g. Palmer et al. Reference Palmer, Lum, Schlossberg and Gaby2017). I present the resources of Finnish in Section 2.3, and in Section 6, I consider them in relation to the resources of Swedish, French, and Thai as analysed by Blomberg (Reference Blomberg2014).
There are further factors that should be taken into consideration and these include the effects related to the elicitation task and the stimuli, such as the camera angles, the salience of different kinds of visual landmarks, and the nature of video stimuli compared to static pictures (see den Ouden et al. Reference den Ouden, Fix, Parrish and Thompson2009). However, a thorough analysis of these factors is beyond the scope of this article, but nevertheless features of individual videos are discussed when necessary to explain the results.
2.3 The resources of Finnish
In HSS, the FoRs are included in a set of semantic categories that are used for a typological analysis of motion situations (e.g. Zlatev et al. Reference Zlatev, David, Blomberg, Vyvyan and Paul2010, Blomberg Reference Blomberg2014). Similarly, the typology of motion descriptions forms the wider context of this study. Finnish has been placed among the satellite-framed type by Talmy (Reference Talmy2000:60) and in the case-framed cluster by Naidu et al. (Reference Naidu, Zlatev, Duggirala, Weijer, Devylder and Blomberg2018) (see Section 6), however, no thorough empirical analysis concerning the expression of motion in Finnish has been presented. The more general question of how motion is verbalised in synthetic case languages with elaborate systems of local casesFootnote 5 has seldom been raised in typological contexts (however, see e.g. Ibarretxe-Antuñano Reference Ibarretxe-Antuñano2009, Naidu et al. Reference Naidu, Zlatev, Duggirala, Weijer, Devylder and Blomberg2018).
In this section, I provide background information for the analysis of FoRs in motion situations, presenting a brief overview of the resources Finnish deploys in the expression of motion. I focus on the categories most relevant to the context of FoRs: PathFootnote 6 and Direction.
Zlatev et al. (Reference Zlatev, David, Blomberg, Vyvyan and Paul2010:395–396) define the categories of Path and Direction so as to cover, respectively, bounded and unbounded trajectories (see also Miller & Johnson-Laird Reference Miller and Johnson-Laird1976:405–410). Bounded motion implies a state-transition of the Figure through at least one of the phases of Path: beginning ( from the forest), middle ( through the forest), or end ( to the forest). Unbounded motion is not connected to any of these phases, and thus the trajectory is expressed as vector-like Direction ( towards the forest; left ) rather than Path (Zlatev et al. Reference Zlatev, David, Blomberg, Vyvyan and Paul2010:395).
In general, Finnish motion descriptions are characterised by flexibility of expression and a variety of different means of expression in both the grammar and lexicon, as well as on the borders between them (e.g. Tuuri Reference Tuuri2021). The most prevalent structural feature for expressing Path is the case-marking of noun phrases. Finnish has a system of 15 cases,Footnote 7 which includes a subset of six local cases that carry meanings typically expressed by prepositions in many (Western European) languages (see Table 1). Of the local cases, two express static location and four form the standard marking of the beginning and end of Path, both displayed in (1).Footnote 8
The spatial system also includes adpositions, especially for the middle part of Path, such as halki ‘across’ in (2).
Example (2) also illustrates the position of Finnish in relation to two standard measures in motion typology: Finnish allows chaining of Path elements and does not apply the boundary-crossing constraint, that is, it allows boundary-crossing with Manner verbs.
In addition, adverbs as in (3) and Path verbs as in (4) participate in the expression of Path.
However, they typically appear as optional, complementary means that underline the meaning conveyed by the case-marking. Thus, Finnish deviates from the typical patterns of both the Swedish and the Spanish type (see also Naidu et al. Reference Naidu, Zlatev, Duggirala, Weijer, Devylder and Blomberg2018). A typical function for these additional means is to emphasise boundary-crossing situations, that is, entrances into or exits from a bounded space (e.g. Aske Reference Aske1989).
The expression of unbounded Direction mostly consists of adverbs (e.g. ylös ‘up’, poispäin ‘away’) and adpositions (e.g. kohti ‘towards’). Verbs also express Direction with respect to all FoRs: deictic verbs typically represent VC,Footnote 9 as in (5), verbs encoding vertical directions represent GC, as in (6), and verbs such as lähestyä ‘approach’ represent OC.
A special characteristic of Finnish is the possibility to express the (un)boundedness of the trajectory with case alternation: the partitive object presents the trajectory as unbounded, as in (6), while a total object is used to express the boundedness of the trajectory (e.g. Heinämäki Reference Heinämäki, Casper and Hannu1984). The account above relies on the central resources attested in the current data and is not exhaustive, but as can be seen, Finnish deploys a large set of both grammatical and lexical means in the expression of Path and Direction.
3. Method and data
The data presented in this article were collected using the elicitation tool Trajectoire (Ishibashi, Kopecka & Vuillermet Reference Ishibashi, Kopecka and Vuillermet2006), an etic grid consisting of 76 filmed video-clips (see Figure 1), two of which are used as a warm-up task.Footnote 10 The videos are 8–14 seconds long and they include different kinds of situations: most of the stimuli (54)Footnote 11 include human translocation (e.g. a woman walks into a cave) but there are also instances of caused motion (e.g. a woman kicks a ball to a man) and static situations (e.g. a man lies on the lawn). Variables include different figures (women, men, and children), different landmarks (e.g. caves and forests), and different kinds of trajectories in relation to the landmarks (e.g. entering and ascending). The trajectories also vary in complexity. The Manner of motion covers three main types: walking, running, and jumping. Trajectoire has been used in data collection from a considerable number of typologically diverse languages, including, for example, Swedish, French, and Thai (Blomberg Reference Blomberg2014) compared to Finnish in Section 6. The tool is described more thoroughly by Vuillermet & Kopecka (Reference Vuillermet, Anetta, Aimée and Marine2019).
The data were provided by 50 adult native speakers of Finnish (33 female, median age 26 years) in 2013–2015. Most of the informants were university students. The informants were recruited through social media and student mailing lists, and they received a cinema ticket or partial course credit as compensation.
In the data collection sessions, the 76 videos were shown to one informant at a time. Three different viewing orders were used to control for possible effects of the order, that is, the videos already seen affecting the descriptions. The informants were asked to describe the central content of each video succinctly. The following guideline was presented orally in Finnish and as a written version on a screen: ‘You will see a series of short videos in which one or more individuals do something. After each video, please describe, in about one sentence, what happened.’. The descriptions were video-recorded and transcribed in ELAN (Sloetjes & Wittenburg Reference Sloetjes and Peter2008). Once in textual form, the data were analysed both as individual words (morphologically and semantically) and as whole descriptions (semantically). The whole data covers 3,690Footnote 12 descriptions (22,636 words) for both the motion videos and other types of situations. The analysis sections of this article concentrate on different parts of this data, as explicated in Table 2. The changes in focus are clarified in the text.
4. Variation as determined by individual strategies
With respect to linguistic, demographical, and environmental variables, the informants in the study form a rather homogeneous group. In relation to this and to the cross-linguistic differences (see Section 6), the individual variation within the data was considerable (see Figure 2).Footnote 13 The types OC, OC+VC and OC+GC were used by all the informants. The less common VC and OC+VC+GC types were not used by all the informants, yet their use was rather scattered. GC was not used alone at all. The types including VC tended to cluster: most of the informants that used the VC and OC+VC+GC types also had a considerable number of OC+VC.
OC and OC+VC were the main types of encoding FoRs. The use of GC was tied to certain videos, and informants then unanimously encoded the vertical directions in connection with these videos. The range of variation in the case of OC+GC was small, while in the case of OC and OC+VC, the ranges were considerably wide (see Figure 3).
To illustrate the differences in coding strategies more clearly, the variation was reduced to two main classes in Figure 4. The data were reorganised so that the VC class contains all the descriptions that include elements classified as VC (VC, OC+VC, OC+VC+GC). The OC class contains all the OC and OC+GC descriptions.
There is a correlation between the two strategies, and they are thus the main competing options of encoding FoRs. Some of the informants rather systematically avoided assessing the motion situations in relation to themselves, whereas some tended to include VC elements in more than half of their descriptions. In between, there were informants whose descriptions covered a wider internal variation.
What kind of linguistic choices, then, hide behind these strategies? The OC strategy is somewhat simple, consisting of any translocative description with a reference to one or more external Landmarks, as in examples (1)–(4), sometimes together with vertical geocentric elements, as in example (6). The inclusion of VC covers a range of different strategies: deictic verbs, as in (5), and demonstratives,Footnote 14 relative references on the lateral and frontal axes, as in (10), and overt references to the camera or viewpoint, as in (12). To provide an overview of these resources, I analysed the descriptions produced by the three informants that used VC the most, i.e. in more than 50% of their descriptions. The main strategy these informants used, in about a third of their descriptions, was an overt reference to the camera or to the viewer. The use of deictic verbs was a VC strategy almost as typical in these descriptions. The rest of the descriptions referred relatively to left and/or right,Footnote 15 or to back and/or front.
The expressions oikea ‘right’ and vasen ‘left’ can also be used intrinsically, identifying with the Figure’s viewpoint (Levinson Reference Levinson2003:97). These expressions were, however, strongly relative in the current data. Example (7) was the only one out of 37 descriptions containing reference to left and/or right from the Figure’s viewpoint.
It appears that there are conceivably varying motivations for using VC: the viewer evaluates the situation in relation to her/his own location but there are differences in the level of participation. When using deictic verbs or adpositions such as edestä ‘by+in front of’ and takaa ‘by+behind’ in a relative way, for example, the viewer seems to be more absorbed in the situation, describing motion in relation to her/his own location (or her/his own circle of attention; see Matsumoto, Akita & Takahashi Reference Matsumoto, Akita, Takahashi and Iraide2017). When explicitly referring to the camera, the viewer rather distances her/himself from the situation, acknowledging a border between the situation on the screen and the situation of watching the stimuli (see Tannen Reference Tannen and Wallace1980 for cross-linguistic differences). The elicitation situation and the video format thus contribute an extra dimension. The first strategy was used by all informants to some extent. The second one was more clearly an individual strategy used either extensively or not at all. A few participants also used a strategy of referring to themselves with a first-person pronoun (e.g. minua kohti ‘towards me’), often accompanied with gestures pointing to themselves. This strategy, though objective in the sense of Langacker (Reference Langacker1990), seems to include the participant in the motion situation in a way that resembles the use of deictic verbs.
5. Stimulus-determined variation
As stated in Section 2.2, language-internal variation in the choice of FoRs is most likely to be affected by the features of the encoded motion situations, and, in an experimental context, also the characteristics of the visual stimuli. In this section, I analyse variation with respect to the stimuli and the motion situations represented. The analysis discusses variability in the use of FoRs and considers the typical patterns of encoding different motion situations in Finnish.
The variability connected to each video was computed through Simpson’s diversity index (henceforth SDI) that indicates variability within a population, considering the number of different types and the relative representation of each type in a population.Footnote 16 The range is from 0 to 1, scores close to 0 indicating low variability and scores close to 1 indicating high variability. The SDI was calculated for each video using the frequencies of different FoRs and their combinations (OC, VC, OC+VC, OC+GC, OC+VC+GC) together with non-translocative and n/a as the values. In other words, if the SDI was close to 0, almost all participants described the stimulus unanimously with respect to FoRs, and if the SDI was close to 1, the participants used varying description strategies.
The median value for SDI in the data was 0.44, the minimum was 0 and the maximum 0.74. To reach a general view of the variability, the SDIs of individual videos were analysed in relation to the trajectory type represented in each video. As Vuillermet & Kopecka (Reference Vuillermet, Anetta, Aimée and Marine2019:103) show, the stimuli were designed to include both simple trajectories (either source-, goal- or median-oriented) and complex trajectories consisting of different combinations of the afore-mentioned. Adjusted to the terminology of HSS, the simple trajectories represent the beginning (henceforth beg), end or middle (henceforth mid) part of Path. As HSS makes the distinction between bounded Path and unbounded Direction, videos showing an unbounded trajectory (e.g. along a road) were classified as unbounded. Videos that show two or more phases of a trajectory (either bounded or unbounded) were analysed as belonging to the category complex.
Once the videos were organised on a scale from the highest SDI to the lowest, it became clear that the type of trajectory was not the only factor explaining the variability in the use of FoRs. As predicted in the literature (e.g. Palmer et al. Reference Palmer, Lum, Schlossberg and Gaby2017) and discussed in Section 2.2 above, various variables are expected to function together, and thus a clear-cut effect was not expected to be found. The types, however, showed some tendencies to focus on the low variability or high variability end of the scale. These tendencies are illustrated through the distribution of values of SDI in Figure 5 and explicated in the following analysis.
At the low end of the scale, the stimuli representing Path:end tended to be described rather simply. The median SDI in this class was 0.37. Path:end was dominated by simple OC descriptions of the Figure entering or reaching a Landmark, typically encoding Path with illative or allative case and motion with a Manner verb, as in (8).
The Path:mid stimuli were rather scattered on the SDI scale, the median being 0.41. This goes back to the heterogeneous nature of the motion situations included in this class. Path:mid covers variations of situations such as passing, crossing, and traversing. At the high variability end of the scale, there are situations of passing a landmark. These stimuli were often encoded with perspective-free adpositions such as ohi ‘by’ or with the Path verb ohittaa ‘pass’. Another typical option was to include the VC using adpositions such as edestä ‘by + in front of’ and takaa ‘by + behind’ in a relative way, as in (9). The SDI of the video described in (9) was 0.60.
However, situations of crossing and traversing tended to be less variable with respect to FoRs and typically included OC descriptions with adpositions such as yli ‘over, across’ and poikki ‘across’ or the Path verb ylittää ‘cross’. For example, the SDI for a video of a man jumping over a trunk while running in the forest was 0 and all the descriptions, like (10), were OC.
Complex motion situations represent intermediate variation with a median SDI of 0.42. Complexity in the trajectory does not necessarily lead to complexity with respect to FoRs: Most of the descriptions only included the OC frame, or, often, more than one instance of an OC reference, as in (11).
It may be that the encoding of more than one Landmark reduces the likelihood of mentioning other aspects of the situation, such as the orientation with respect to the viewer. This is probable, especially considering the guideline to keep the descriptions succinct.
For stimuli representing Path:beg, the variability was higher than for the other phases of Path, the median SDI being 0.49. This is remarkable especially with respect to the lower variability within the Path:end stimuli, as these two phases of Path are widely acknowledged to be represented asymmetrically in language (e.g. Ikegami Reference Ikegami, René and Günther1987).
The asymmetry typically manifests as more frequent and more elaborated expressions of Path:end in language. On the other hand, Path:end as the more widely expressed standard option may also be encoded more simply than Path:beg (Kopecka & Ishibashi Reference Kopecka and Miyuki2011:133), which seems to be the clearest manifestation of the asymmetry in the current data. As stated above, Path:end was typically encoded with rather simple OC constructions. Path:beg, instead, favoured more complex elaboration with deictic verbs or other references to the Direction in relation to the viewer. The difference between these phases of Path can be illustrated by two videos including the same elements but differing with respect to the phase of Path. A Path:end video of a woman walking to a tree had an SDI of 0.15, whereas a Path:beg video of the same woman walking away from the same tree had an SDI of 0.68. A central cause of the variation in the latter case was the choice of including a reference to the tree, the viewpoint of the speaker, or both, as in (12).
The difference between Path:end and Path:beg derived from differences in the use of OC+VC and VC as neither of the classes include clear instances of vertical motion and thus no notable use of OC+GC.
Variability was highest in the class of unbounded trajectories, the median SDI being 0.55. This can be accounted for by different strategies of description. First, there was variation in whether the motion situation was encoded as translocative or not. The unbounded trajectories, as expected, were most likely to be described as motion in an environment (e.g. kävelee metsässä ‘walks in the forest’).
Second, the use of only VC was more typical for unbounded trajectories than bounded ones. This mostly consisted of references to the camera or the speaker as the only Landmark. This is consistent with the explanation of Matsumoto et al. (Reference Matsumoto, Akita, Takahashi and Iraide2017:112) for the relatively frequent use of deictic PPs in scenes with motion in open space: phrases such as toward me tend to be expressed when there is scarcely any other Path information to encode. The use OC+VC, instead, was rare in unbounded situations. Thus, in the case of unbounded trajectories, reference to a viewpoint appears to be an option for other strategies rather than an additional dimension to other kinds of spatial reference.
Third, the inclusion of GC was rather common due to several videos that posit a vertically aligned landmark, such as stairs or a hill, and thus evoke the use of adverbs and verbs encoding verticality. The video that produced the highest variability in this class and in the whole data (SDI 0.74) was encoded with all the above-mentioned strategies and all the FoR combinations occurring in the data. This video of a boy walking on a rock and coming towards the viewer was described, for example, as non-translocative, with VC, as in (13) with the viewer as the implicit Landmark for the adverb kohti ‘towards’, and with the combination OC+GC, as in (14).
On the other hand, the inclusion of vertical elements is not necessarily a factor causing extensive variation. Instead, the videos that include a very clearly vertically inclined landmark tended to be rather unanimously encoded with elements referring to verticality. In most cases, this led to the combination OC+GC being clustered in certain videos, while OC+VC was more scattered in the data. For example, the video of a woman climbing up a narrow path (see example 17) had an SDI of 0.32.
In summary, the features of the motion situations presented in the stimuli are one of the factors that affect the choice of FoRs. However, the fact that videos representing different trajectories are somewhat scattered on the SDI scale shows that this – as expected – is only one factor among many.
6. Variation as typologically determined
Of the levels of variation that can be recognised in the expression of motion, cross-linguistic variation has been widely explored, and language-internal variation less so. While focusing on the latter, I believe these viewpoints are most enlightening when combined. Thus, in this section, I will provide a cross-linguistic perspective by comparing the FoRs of Finnish spatial clauses (3,687) with those of three other, typologically distinct languages: Swedish (17 informants), French (17 informants), and Thai (14 informants); these languages have been analysed with the HSS framework by Blomberg (Reference Blomberg2014) using the same elicitation tool.
In all these languages, the use of only OC was the most typical option and other major classes consisted of combinations of OC with either VC or GC (see Figure 6). The VC and GC FoRs appeared to be additional dimensions that are more prone to variation between languages.
Finnish aligned with Swedish and French as regards the domination of OC, with all these languages having an approximate proportion of 70%. Thai combined the FoRs more than the other languages: the proportion of OC was the lowest (c.40%), and the OC+VC combinations were almost as frequent. The OC+VC+GC combinations were also most frequent in Thai (c.9% in Thai and c.0.5–1% in the other languages).
Regarding OC+VC and OC+GC, Finnish seems to take an intermediate position with respect to Swedish and French. With respect to OC+VC combinations, these languages were on a scale of c.10–20%. These combinations were most frequent in French, followed by Finnish. OC+GC combinations were most frequent in Swedish, followed by Finnish, and all the three languages were on a scale of c.7–15%. In both OC+VC and OC+GC combinations, Finnish patterned together with French rather than with Swedish.
A set of linguistic factors is likely to explain some of these cross-linguistic differences. First, Thai differs from the other languages by using serial verb constructions that have a specific slot for deictic verbs, which also leads to more combinations of all three FoRs (Blomberg Reference Blomberg2014:133). In Finnish, the OC+VC combination is used more than in Swedish, which is partly due to differences in verb semantics. Finnish has two verbs, mennä and kulkea, that can be translated as ‘go’, and, in addition, a standard Manner verb kävellä ‘walk’. The central difference between mennä and kulkea is the stronger directionality of the verb mennä; it is predominantly deictic.Footnote 17 Swedish gå, instead, corresponds both to ‘go’ and to a Manner verb meaning ‘walk’ and thus cannot be coded as clearly deictic.
The inclusion of a vertical GC is most typical in Swedish. Blomberg (Reference Blomberg2014:133) attributes the difference in OC+GC in French and Swedish mainly to the adverbial resources of Swedish. Adverbs such as uppför ‘up’ typically appear together with the expression of the Landmark object, as in Blomberg’s example (15).
Similarly, Finnish Direction adverbs ylös ‘up’, ylöspäin ‘upwards’, alas ‘down’ and alaspäin ‘downwards’ often accompany Landmark expressions, as in (16).
Another category that participates in the encoding of vertical motion in Finnish are the vertical Direction verbs that are neutral in relation to Manner: nousta ‘ascend’ and laskeutua ‘descend’. Swedish does not have purely directional verbs,Footnote 18 whereas in French the verbs monter ‘go up’ and descendre ‘descend’ are the standard way of expressing vertical Direction. In Finnish, directional verbs and adverbs can appear in the same sentence, as in (17), which creates a redundant expression of verticality and a pattern of semantic distribution (Sinha & Kuteva Reference Sinha and Tanya1995).
The general perception arising from the cross-linguistic comparison is as follows: Thai clearly stands out due to a dominant syntactic pattern. Finnish, Swedish, and French, though possibly representing different types in their general pattern of encoding motion situations, show quite moderate variation with respect to the distribution of different FoRs. However, even a limited comparison shows the role of linguistic resources in affecting the use of FoRs in motion expressions. The FoRs vary according to different form-meaning patterns in languages. This is obvious in the case of Thai but also visible in how Finnish, Swedish, and French relate to each other.
With respect to the resources they typically use to express Path, the languages in this comparison differ from each other. Naidu et al. (Reference Naidu, Zlatev, Duggirala, Weijer, Devylder and Blomberg2018), combining the results of earlier research and their own analysis, suggest that these languages would all represent distinct typological clusters. According to this account, Swedish relies primarily on adverbal forms in the expression of Path, French on verbs, Thai on serial verb constructions, and Finnish on adnominal forms, especially cases, thus possibly creating a cluster with other case languages such as Telugu. The status of Finnish remains an open issue and cannot be further elaborated in this article focusing on FoRs, but the cases and adpositions are certainly central in comparison to the other resources.
7. Conclusions
Recent studies on the use of FoRs in languages have raised questions about variation in the systems of spatial reference (e.g. Dasen & Mishra Reference Dasen and Ramesh2010, Bohnemeyer et al. Reference Bohnemeyer, Donelson, Tucker, Elena, Alejandra Capistrán, Alyson and Néstor Hernández2014, Palmer et al. Reference Palmer, Lum, Schlossberg and Gaby2017). Rather than being defined by one factor (typically language or environment), intertwining effects of language, culture, and environment are posited in models such as the Sociotopographic Model by Palmer et al. (Reference Palmer, Lum, Schlossberg and Gaby2017). The starting point of this article was a study design where most of the factors posited in the literature were standardised by recruiting participants from a demographically rather homogeneous group of speakers of the same language. Even with these premises, considerable variation in the elicited motion descriptions was discovered. The aim of the study was to analyse this variation from different viewpoints and to look for explanations beyond those usually acknowledged. The expressions of motion have been less studied from the point of view of FoRs than the expressions of static location.
The data consisted of motion descriptions elicited with the Trajectoire tool from 50 Finnish speakers. I analysed the motion descriptions in three different ways: first by individuals, then by the stimuli, and finally with respect to three other, typologically distinct languages. Variation proved to be extensive, and the analysis showed that individual preferences, as well as elements of the motion situations are also factors that should be considered in connection with FoRs. A variety of linguistic resources was also linked to the use of FoRs as possible factors explaining language-specific preferences.
The object-centred FoR dominated rather clearly with respect to its representation in the data. The use of the geocentric FoR was, in line with the general tendency of a reduced geocentric FoR in urban societies (e.g. Pederson et al. Reference Pederson, Danziger, Wilkins, Levinson, Kita and Senft1998), only detected in vertical directions together with OC. Variation in the use of GC was detected both in the cross-linguistic comparison and between individual speakers of Finnish. In both cases, however, the variation was rather moderate.
However, the use of the viewpoint-centred FoR is where the widest individual variation in the Finnish data was observed. There were informants who dismissed the viewpoint almost completely, and others who included it in more than half of their descriptions. However, the results also showed the effect of the elicitation situation, since the informants who relied most on VC tended to refer explicitly to the location of the camera or the viewer. This was noticeable in the data as creating a specific individual strategy.
The stimuli, and the motion situations presented in them, were recognised as one source of variation. The SDIs calculated for each video showed tendencies of patterning according to the different types of trajectory. On average, unbounded trajectories and Path:beg produced the highest variability, especially due to the different VC strategies. Path:end, on the other hand, was typically expressed rather simply with respect to FoRs, and Path:mid and the complex trajectories represented intermediate variation. The difference between Path:beg and Path:end could be explained through the generally acknowledged asymmetry between source and goal and the future-oriented nature of human cognition: as it is typically more important to express where we are going instead of where we are coming from, it is logical that the beginning of Path would be expressed in a marked way and the end of Path in a simpler, unmarked way (e.g. Lakusta & Landau Reference Lakusta and Barbara2005).
The cross-linguistic comparison with the data on Swedish, French, and Thai (Blomberg Reference Blomberg2014) suggested that the use of FoRs does not very clearly follow the general patterns for encoding motion. Finnish patterned together with both Swedish and French in the domination of OC and, somewhat unexpectedly, seemed to be closer to French than Swedish in the OC+VC and OC+GC combinations. The results were linked to certain linguistic resources and differences in the form-meaning patterns in the languages compared; however, a more thorough account of the cross-linguistic variation of FoRs in motion situations is needed in the future.
Motion is a complexly encoded domain in Finnish, which is also reflected in the use of the FoRs. The variety of different means and strategies used to describe a set of relatively simple situations shows the need for further studies. To reach a more thorough understanding of the phenomenon, application of different experimental methods would be in order. The central linguistic resources should also be analysed with various kinds of data. The study also shows the need for more theoretical discussion on FoRs in motion situations and a consistent conceptual framework. For example, the analysis showed clear differences in the expression of bounded and unbounded trajectories, and this is an argument in favour of keeping them conceptually apart as in the HSS framework.
Acknowledgements
I wish to thank the three anonymous NJL reviewers for their valuable comments. I am also grateful to several colleagues for helpful discussions during the writing process of this article. The work was funded by the Kone Foundation, grant 201609105.