Introduction
Received Pronunciation (RP) is one of the most widely recognized, and frequently discussed, varieties of British English. Despite this, there exists relatively little variationist research on language use among RP speakers (Badia Barrera, Reference Badia Barrera2015; Fabricius, Reference Fabricius2000, Reference Fabricius2002, Reference Fabricius2007; Halfacre & Khattab, Reference Halfacre and Khattab2019 are notable exceptions). According to Fabricius (Reference Fabricius, Braber and Jansen2018), this is due to a conflation between RP as the standard variety of British English—that is, an abstract construct used to organize ideologies of linguistic “correctness” and “prestige” in the UK (see Agha, Reference Agha2003; Fabricius & Mortensen, Reference Fabricius, Mortensen, Kristiansen and Grondelaers2013)—and RP as the habitual variety of a sociologically defined group of speakers (e.g., those who are upper-class and/or privately educated). In this paper, we contribute to the study of RP as it is actually used among young elite speakers in Britain today. Specifically, we focus on the vowel system of modern RP and compare realizations of short vowels by upper-class speakers in London with those of their working-class counterparts. We do so in order to identify the role that linguistic variation continues to play in marking social distinction in the UK. This is important because of how discourses of social class in Britain have changed over the past 15 years. While popular discussions during the final decades of the twentieth century emphasized a supposed fracturing of class hierarchies and the rise of a so-called “meritocracy” (Adonis & Pollard, Reference Adonis and Pollard1997; Cannadine, Reference Cannadine1999; Turner, Reference Turner2013), recent scholarship has highlighted the persistence of entrenched class divisions and an increased awareness of how these divisions structure contemporary British society (e.g., Biressi & Nunn, Reference Biressi and Nunn2013; Friedman & Laurison, Reference Friedman and Laurison2020; Savage, Devine, Cunningham, Taylor, Li, Hjellbrekke, Le Roux, Friedman, & Miles, Reference Savage, Devine, Cunningham, Taylor, Li, Hjellbrekke, Le Roux, Friedman and Miles2013; Williams, Reference Williams2006). We seek to map the effect that such changes have had on language use and to provide an updated picture of elite speech in Britain. Specifically, we explore the ways in which class-linked embodied behaviors may be implicated in current patterns of sound change.
We begin in the next section with a brief overview of developments in RP over the course of the twentieth century, focusing in particular on reports of levelling in RP toward other southeastern British varieties. We go on to describe how from around 2010, discussions of possible changes in “posh speech” began to appear in British popular media. We take these media discussions as evidence of the renewed salience of linguistic markers of eliteness (see Fabricius, Reference Fabricius, Braber and Jansen2018). The bulk of our analyses are then devoted to investigating the validity of these media depictions, examining whether speech among young elites in London corresponds to these popular representations. To do so, we analyze the speech of cast members of the reality television show Made in Chelsea (which we take as a proxy for modern RP) and compare it to the speech of cast members of the show The Only Way Is Essex (which we take as a proxy for the generational descendant of Cockney). We describe how differences in vowel realizations across the two shows are consistent with the speakers adopting distinct articulatory styles, or embodied settings. We close by discussing the ramifications of our findings for current understandings of modern RP and for the relationship between language and social distinction more generally.
The rise, fall, and rise of RP
The accent that eventually would come to be named RP first began to emerge in the eighteenth century as part of a broader movement to standardize English in Britain (Mugglestone, Reference Mugglestone2007). While ostensibly driven by a desire to create a “neutral” and “non-local” style of speech—that is, one that would be adopted by people regardless of their region of origin—RP was from its outset explicitly modelled on the accent used by educated, upper-class speakers in and around London. RP has thus always been characterized by a central tension: it is a social and regional accent of a particular population of speakers (upper-class individuals from the southeast) as well as a codified, non-regional standard that serves as an emblem of sociolinguistic prestige (Agha, Reference Agha2007; Fabricius, Reference Fabricius, Braber and Jansen2018). It is not accidental that it was a supralocal accent of educated London English that was chosen to be the supposedly “neutral” standard. The push for standardization was undergirded by language ideologies that positioned educated London English as the “most correct” accent, due primarily to its association with the “best” London society. While this push to position RP as a standard emerged in the eighteenth century, there is evidence that it took another 100 years before it was firmly implanted, with its rise supported by the establishment of a network of fee-paying boarding schools (called “public schools”), places where young upper-class men from around the country were sent to study. Indeed, the earliest phonetic descriptions of what would later be labelled RP used the term Public School Pronunciation (PSP; Jones, Reference Jones1909, Reference Jones1917), identifying it as the speech of “the families of Southern English persons whose men-folk have been educated at the great public boarding-schools” while also conceding that the accent can be found among former boarding-school pupils not from the South of England (see also Collins & Mees, Reference Collins and Mees1999; Jones, Reference Jones1917:viii).
This duality in understandings of RP has been evident since the earliest scholarly descriptions of the accent. Jones (Reference Jones1926) dropped the label PSP from the second edition of his English Pronouncing Dictionary, choosing instead to revive Ellis’ (Reference Ellis1869) earlier term Received Pronunciation in order to foreground the “received” (i.e., generally accepted) nature of the accent as a model for learners of English. The tradition of treating RP as an idealized model (or construct; Fabricius & Mortensen, Reference Fabricius, Mortensen, Kristiansen and Grondelaers2013) for language learning was sustained by Gimson (Reference Gimson1962) in his Introduction to the Pronunciation of English, which became the classic reference for the phonetic description of “standard English” (see Roach, Reference Roach2004 for a summary). This had the effect of conflating three different phenomena under the label RP within the scholarly literature: a norm for language teaching, a Platonic ideal of “correct speech,” and a sociolinguistic variety used by a defined population of speakers (Fabricius, Reference Fabricius, Braber and Jansen2018:41; see also Wells, Reference Wells, Melchers and Johannesson1994, Reference Wells1997).
The first detailed discussion of variation in RP appears in Wells (Reference Wells1982), where he made a distinction between upper-class RP (U-RP, a traditional form) mainstream RP (the generational descendant of U-RP), and adoptive RP (the version spoken by people who did not speak RP growing up). Innovations that Wells (Reference Wells1982) identified in mainstream and adoptive RP include the fronting of the vowels in the goat and goose lexical sets ([oʊ] to [əʊ] and [uː] to [ʉː], respectively), tensing of the unstressed happy vowel ([ɪ] to [i]), lowering of trap and kit, /t/-glottaling in word-final, pre-consonantal position (e.g., le[ʔ] me rather than le[t] me), and the monophthongization of centering diphthongs (e.g., [ɔː] for [ʊə] in a word like sure), among others (Kerswill, Reference Kerswill and Britain2007). While Wells was somewhat ambivalent about whether these differences are chronological, variationist studies over the ensuing decades have tended to argue for a diachronic explanation, finding evidence that RP was levelling toward a broader, supra-local Southeastern norm (on /t/-glottaling, see Badia Barrera, Reference Badia Barrera2015; Fabricius, Reference Fabricius2000; on happy tensing: Fabricius, Reference Fabricius2002; on short front vowel lowering: Fabricius, Reference Fabricius2007; Harrington, Palethorpe, & Watson, Reference Harrington, Palethorpe and Watson2000; on yod coalescence: Hannisdal, Reference Hannisdal2006; on rhotic quality: Fabricius, Reference Fabricius and Hickey2017; on goose fronting: Bauer, Reference Bauer1985; Jansen & Mompean, Reference Jansen and Mompean2023; Kerswill, Reference Kerswill, Rajamäe and Vogelberg2001).Footnote 1 Moreover, RP was reported as participating in the anticlockwise vowel shift. This shift—which involves the lowering of kit, dress, and trap, the raising of strut and lot, and the fronting of goose and foot—has been shown to be underway for a number of decades across the entire southeast region of England, and affects all sociolects along the class-continuum (Fabricius, Reference Fabricius2007, Reference Fabricius, Calhoun, Escudero, Tabain and Warren2019; Tollfree, Reference Tollfree, Foulkes and Docherty1999; Torgersen & Kerswill, Reference Torgersen and Kerswill2004; Trudgill, Reference Trudgill1986, Reference Trudgill2004).
The apparent generational levelling in RP coincided with the rise of an identifiable, intermediate form: Estuary English (Rosewarne, Reference Rosewarne1994) and was a focus of attention in both scholarly and popular conversations. Its emergence was seen by many as symbolizing a broader shift toward a more meritocratic society, one in which social mobility is possible and linguistic differences are no longer a hindrance to social advancement. Considering the simultaneous rise of Estuary English and levelling in RP, many scholars suggested that the changes in RP could be the result of a pressure to avoid the overt upper-class connotations of certain RP features, and instead to adopt a style of speaking that was more in tune with the supposed breakdown of class distinctions in late-twentieth century Britain (Burridge, Reference Burridge2004; Harrington, Reference Harrington, Hay and Hualde2007; Harrington et al., Reference Harrington, Palethorpe and Watson2000; Wells, Reference Wells, Melchers and Johannesson1994; see also Fabricius, Reference Fabricius, Braber and Jansen2018; Jansen & Mompean, Reference Jansen and Mompean2023). In other words, the expansion of the middle-class in the final decade of the twentieth century was seen as one of the drivers of linguistic levelling in RP, presumably brought about by speakers who in previous decades would have used traditional RP forms shifting to a supra-local, Estuary-like standard instead (Kerswill, Reference Kerswill, Rajamäe and Vogelberg2001; Cole & Strycharczuk, Reference Cole and Strycharczuk2024).
Yet since the turn of the twenty-first century, and particularly since the financial crisis of 2008, popular discourse in Britain about class and its relationship to language has changed dramatically. Recent sociological studies detailing the persistence of rigid class hierarchies and the continued relevance of entrenched (hereditary) privilege (e.g., Blanden, Gregg, & Machin, Reference Blanden, Gregg, Machin, Machin and Vignoles2005; Buscha & Sturgis, Reference Buscha and Sturgis2018; Clark, Reference Clark2014; Friedman & Laurison, Reference Friedman and Laurison2020; Wakeling & Savage, Reference Wakeling and Savage2015) have received major press coverage, leading to a renewed attention to class in the public sphere. As Fabricius (Reference Fabricius, Braber and Jansen2018) noted, this has been accompanied by numerous films and television shows which satirize specific forms of “elite” practice. Some of these parodies pay particular attention to language, and to a purportedly new “posh” style of speech that has emerged over the past 15 years. Beginning in 2010, for example, British comedian Matt Lacey posted a series of parodic videos to YouTube entitled “Gap Yah.” The videos depict the adventures of a young upper-class man (“Orlando”) who is travelling around the world during his “gap year” between university and the start of his career. In these videos, Orlando uses a very distinctive speech style featuring heavily backed and lowered short front vowels (including his pronunciation of the word “year” as [jɑ:]), accompanied by a lowered and immobile jaw. These lowered vowels were taken up as emblematic of the poshness Lacey was parodying, used, for example, in the title of his follow-up book (The Gap Yah Plannah from 2011) and by reporters commenting on his work (described, for example, as “saariously funny” in The Times in 2010).
More recently, a review of American actor Kristen Stewart’s performance as the title character in the 2021 Princess Diana biopic Spencer described her British accent as “entirely convincing, hitting the exact self-conscious, detached-jaw, pseudo-estuary drawl that posh people have adopted now that they realize how silly Received Pronunciation sounds” (Heritage, Reference Heritage2021). The precise linguistic composition of this drawl is described in more detail in a video posted to TikTok in August 2022 by British comedian Russell Kane. In the video, Kane complained about what he described as “posh people’s squashed vowels,” comparing, for example, his own (working-class) pronunciation of the British bookstore chain Waterst[aʊ]ne’s to the “posh” pronunciation: Waterst[ɨ]n’s. Kane went on to comment, “Is there something wrong with your mouth? Is it so posh that it’s become, that it’s got a British osteoporosis of the lip”? Kane’s description of posh oral rigidity (osteoporosis of the lip) resonates with the description of Stewart’s “detached jaw” and Lacey’s backed, open vowels.
Similar mediatized linguistic manifestations of elite and upper class speakers have been noted and analyzed empirically in North America. Pratt and D’Onofrio (Reference Pratt and D’Onofrio2017) presented an analysis of parodic performances of elite Californians in the sketch television show Saturday Night Live. Their analysis compared the visible articulatory settings and associated acoustic manifestations of the actors’ performances in and out of the Californian characters. They reported a visibly backed, open-jawed articulation accompanied by an overall reduction in the actors’ vowel space. Pratt and D’Onofrio argued that the actors strategically deploy an open jaw articulatory setting as a way of enacting and physically embodying a particular type of elite Californian. In an earlier study, Kroch (Reference Kroch, Guy, Feagin, Schiffrin and Baugh1996) made a comparable observation about the speech style of Philadelphia’s historically elite families. The upper-middle class Philadelphians were characterized as exhibiting a “relaxed articulation” that “conveys a strong sense of entitlement.” In his analysis of the speech data, Kroch observed that the “relaxed” speech style correlated with a slower speaking rate (a so-called “drawling quality”) as well as a laryngealized voice quality, known locally as “Main Line Lockjaw.” Kroch further described how the elite speakers were “phonetically less extreme” in their realization of ongoing changes in the Philadelphia vowel system. In practice, this translates to less raising and fronting among the elite speakers, with vowels positioned generally backer and lower in the vowel space overall—a pattern that is similar to that noted for elite speakers on the West Coast of the US as well as those in the UK. Taken together, it would appear that there is a shared embodied projection of elite status shared across RP speakers, elite Californians, and upper-middle class Philadelphians via the use of an articulatory setting that is iconically linked to entitlement and ease (cf. Levon & Holmes-Elliott, Reference Levon and Holmes-Elliott2024 for a fuller discussion).
As these examples illustrate, both popular discussions and empirical analyses of “posh” speech styles focus on the use of a particular embodied posture (lowered jaw) combined with a specific realization of vowels (lowered and centralized) as new linguistic emblems of eliteness. The current study investigates the extent to which these metalinguistic comments about RP reflect sociolinguistic reality. It is interesting to note that the features we find in media portrayals can be seen as extensions (or exaggerations) of patterns that were described as incipient changes in the literature on RP in the twentieth century, such as the lowering and backing of trap and kit (e.g., Fabricius, Reference Fabricius2007) and the maintenance of an immobile jaw (see Agha, Reference Agha2003, Reference Agha2007). In the remainder of this article, we examine whether the linguistic style depicted in the media reflects actual patterns of use among elite speakers. If it does, this would provide evidence for a further stage in the development of (modern) RP, one in which the trend toward supra-local convergence in the 1990s is reversing is favor of a newly distinct elite articulatory style.
Data and methods
To investigate the speech of contemporary elite speakers as compared to their working-class counterparts, we focus on variation in vowel realizations in two popular British reality television shows:
• Made in Chelsea (Chelsea), based in the hyper-affluent district of Chelsea in West London, whose speakers are roughly representative of modern RP (Fabricius, Reference Fabricius, Braber and Jansen2018), and
• The Only Way is Essex (Essex), based in Essex in the suburban east of London, where people speak a variety that has its roots in the working-class Cockney accent of London’s East End (Cole, Reference Cole2022; Fox, Reference Fox2015).
Both Chelsea and Essex are so-called “engineered reality” shows that follow a group of twentysomethings in their day-to-day lives. While the scenarios on the shows are orchestrated, the interactions between cast members are not scripted, and the cast engage in spontaneous, naturally occurring speech. Together, the shows therefore provide us with a useful and accessible source of data representing two regionally, socially, and linguistically distinct speech communities within the Greater London area, corresponding to contemporary versions of RP (Chelsea) and Cockney (Essex). We choose to focus on speakers in the London area given prior claims about the potential convergence between RP and other Southeastern varieties (e.g., Kerswill, Reference Kerswill, Rajamäe and Vogelberg2001, Reference Kerswill and Britain2007) and because, historically, RP phonology is based on a Southern British model (Mugglestone, Reference Mugglestone2007). Thus, while language use in the show may not correspond to a fully unselfconscious “vernacular” style, the shows represent a valuable resource for identifying the linguistic features associated with social class positionings in contemporary Britain.
We extracted 82 useable scenes from the first two series of Chelsea (34 scenes) and Essex (48 scenes) for analysis. Together, the scenes totaled just under 6.5 hours of speech and featured 30 central cast members of both shows: 14 speakers in Chelsea (7 women, 7 men) and 16 speakers in Essex (10 women, 6 men). Scenes were taken from high definition downloaded files of the programs and were only selected if they did not contain any music or other background noise. All speech was transcribed and then forced-aligned using the Forced Alignment and Vowel Extraction (FAVE) suite (Rosenfelder, Fruehwald, Evanini, & Yuans, Reference Rosenfelder, Freuhwald, Evanini and Yuan2011). FAVE-Extract was used to obtain F1, F2, and F3 measurements at the mid-points of all stressed tokens (n = 4265) of 11 Southern British English monophthongs: caught, cot, dress, fleece, foot, goose, kit, nurse, palm, strut, and trap. Given prior research on goose-fronting in Southern British English (Holmes-Elliott, Reference Holmes-Elliott2015), the goose class was subdivided into vowels before laterals (ghoul) and those in other environments (goose). Since we are interested in both the placement of individual vowels and the size and shape of the overall vowel space, we followed the vowel normalization methods described in D’Onofrio, Pratt, and Van Hofwegen (Reference D’Onofrio, Pratt and Van Hofwegen2019). Raw F1 and F2 values (in Hz) of all vowels were first converted to a Bark scale (Traunmüller, Reference Traunmüller1990) and then normalized using the formant-intrinsic version of the Nearey single-log mean normalization method (Nearey, Reference Nearey1978) via the vowels package (version 1.2-2, Kendall & Thomas, Reference Kendall and Thomas2023) in R (version 4.3.2, R Core Team, 2023). We opted for the Nearey normalization method since it allows for more robust comparison of inter-speaker differences in the overall size and shape of the vowel space (Barreda & Nearey, Reference Barreda and Nearey2018; D’Onofrio et al., Reference D’Onofrio, Pratt and Van Hofwegen2019; Pratt, Reference Pratt2020) and has been shown to perform as well as other normalization techniques on British English data (Fabricius, Watt, & Johnson, Reference Fabricius, Watt and Johnson2009).
The vowel system of modern RP
We begin by examining the placement and distribution of monophthongs in Chelsea and Essex to obtain a first approximation of the current configuration of the vowel systems of modern RP and Cockney, respectively. These are plotted in Figure 1. Beginning with speakers in Essex (light shading, dashed lines), we see a system that is very similar to the one found for the youngest cohort of speakers in Cole’s (Reference Cole2021) description of contemporary Cockney speech in Essex. The close correspondence in the placement of the vowels between speakers in Essex and the young speakers in Cole (Reference Cole2021) is not surprising given that they are of a similar age and social background (i.e., contemporary Cockney speakers in their 20s) and supports our use of speech in Essex as an approximation of modern-day Cockney.
Among Chelsea speakers (dark shading and solid lines), we find a very different pattern. fleece anchors the high front of the vowel space, but both kit and dress sit significantly lower than in Essex. dress, in particular, is positioned very low in the vowel space, nearly overlapping with the position of trap. F1 values for trap in Chelsea are similar to those in Essex, though trap sits further back along the bottom axis in Chelsea. When compared to Essex, nurse, strut, and palm in Chelsea all occupy a central position, with nurse lowering, strut raising, and palm raising and fronting to converge in a similar low central area of the vowel space. This converged nurse-strut-palm space functions as the bottom anchor of the back diagonal, and is clearly distinguished from (non-merged) cot and caught, which themselves are raised and backed in Chelsea as compared to Essex. goose, is more fronted in Chelsea than in Essex, nearly approaching the position of fleece.
Regression modelling confirms the patterns visible in Figure 1. Models testing the relative position of individual vowels in relation to fleece across Chelsea and Essex demonstrate that dress and nurse are both lower (higher normalized F1), strut and palm are both higher (lower normalized F1), dress, trap, nurse, and cot are all backer (lower normalized F2), and foot is fronter (higher normalized F2) in Chelsea than in Essex (see Appendix Tables A1–A4 for model details and results of by-vowel pairwise comparisons). The collective effect of these individual vowel differences is a general compression in the lower half of the Chelsea vowel space. Indeed, calculations of the area of the polygon connecting dress, trap, strut, palm, and cot in Chelsea versus Essex (using the densityarea package in R (version 0.1.0, Fruehwald, Reference Fruehwald2023) show that the area of the lower half of the Chelsea vowel space is .157 (in Nearey-normalized values) while the lower half of the Essex vowel space has an area of .169. Vowel space area calculations thus again confirm the visual impressions from Figure 1.
To gain a better understanding of these patterns, we compare the vowel realizations in Chelsea and Essex to those of RP and Cockney speakers in earlier generations. Figure 2a–c present the average positions of the short vowels in Chelsea and Essex in relation to those found for a speaker of “traditional” RP born in 1909 (taken from Deterding, Reference Deterding1997), a speaker of Cockney born in 1950 (taken from Mott, Reference Mott2012), and a speaker of “modern” RP born in 1980 (taken from Fabricius, Reference Fabricius2007). The plots were generated by taking the non-normalized average F1 and F2 values reported in Fabricius (Reference Fabricius2007; traditional RP and modern RP), in Mott (Reference Mott2012; traditional Cockney), and in our current dataset (Chelsea and Essex), and normalizing them using the vowel-extrinsic and formant-intrinsic S-centroid normalization method introduced by Watt and Fabricius (Reference Watt and Fabricius2002).
This method involves taking average non-normalized F1 and F2 values for the high-front corner of the vowel space (normally fleece), the bottom anchor (either strut or trap depending on the speaker/variety), and a theoretical high-back corner, defined as having the same F1 and F2 values as the F1 of the high-front corner (i.e., fleece). These three points are then used to calculate the “center of gravity” (or S-centroid). Individual vowel classes can then be plotted as a horizontal (F2) and vertical (F1) ratio from this central value.
The S-centroid method is not intended as a model of psychoacoustic reality or as a means for examining the precise placement of individual vowels. Instead, the method allows us to plot and visually compare the systemic configuration of disparate datasets without necessarily having access to all individual tokens in those datasets. To generate Figure 2a–c, average F1 and F2 values for each dataset across the three corners of the vowel space were calculated, then averaged to calculate the centroid. Average F1 and F2 values for each vowel class are then divided by the respective centroid values to generate S-centroid ratios (see Appendix Table A5 for full calculations). Plotting these ratios in Figure 2a–c provides a heuristic to visually contrast the systems of Chelsea and Essex with those of earlier versions of RP and Cockney.
Figure 2a offers a direct comparison between the short vowels of traditional RP (corresponding to Wells’ [Reference Wells1982] U-RP; black boxes and solid lines) versus Cockney (gray boxes, dot-dash lines), using the data from Deterding (Reference Deterding1997) and Mott (Reference Mott2012), both based on word-list data. In Figure 2a, the classic distinction between the two varieties is evident: dress and trap are positioned much higher along the front diagonal in RP than in Cockney, while RP cot and foot lower and more central.
Figure 2b compares the short vowel systems of three generations of RP: traditional RP (black boxes, solid lines), modern RP (light gray boxes, dot-dash lines, with data taken from Fabricius’ [Reference Fabricius2007] interview corpus), and our Chelsea data (white boxes, dotted lines). The contrast between traditional RP and modern RP in Figure 2b illustrates the substantial changes in RP over the course of the twentieth century. dress is substantially lower, approximating the position in Cockney (cf. Figure 2a). Similarly, trap is low, while strut, cot, and foot are all raised. Fabricius (Reference Fabricius, Calhoun, Escudero, Tabain and Warren2019), among others, have labelled this change in RP the anticlockwise checked vowel shift (Hawkins & Midgley, Reference Hawkins and Midgley2005; see also Trudgill, Reference Trudgill1986, Reference Trudgill2004; Wikström, Reference Wikström2013). According to Fabricius, this shift was presumably initiated by a lowering and backing of trap, which then dragged dress into a lower position and caused the upward raising and inward rotation of strut, cot, and foot. The shift is schematically represented in Figure 3. For our present purposes, it is important to note that these changes—particularly along the front diagonal—had the effect of bringing modern RP vowels closer to their Cockney counterparts. In this regard, changes in the short vowels illustrate the kind of supra-local levelling that took place in RP in the second half of the twentieth century.
On comparing the system of modern RP to our Chelsea speakers, we find similar positions of the vowels. With the possible exception of kit, we see no evidence for any further anticlockwise rotation in the system. Instead, the most striking difference is a shrinking in the vowel space with trap raising, foot lowering, and dress, strut, and cot all centralizing. It is important to note here that a proportion of this difference in vowel space must be attributable to the particular speech modes used in the recordings from each dataset. Deterding’s (Reference Deterding1997) data is based on word list, Fabricius’ (Reference Fabricius2007) data on sociolinguistic interviews, and our Chelsea data on spontaneous speech. The phonetic consequences of speech style are well documented. Styles that are more careful tend toward hyperarticulation; targets become more peripheral and the effect manifests acoustically in an enlarged vowel space (Lindblom, Reference Lindblom, Hardcastle and Marchal1990; Moon & Lindblom, Reference Moon and Lindblom1994). It is therefore safe to assume that a portion of the observed difference between modern RP and Chelsea is driven by differences in types of speech analyzed. However, when we compare Chelsea to Essex (in Figure 2c), where both datasets derive from spontaneous speech, we find that the vowel space in Chelsea is significantly smaller than in Essex. It is this difference which leads us to argue that the effects observed in Chelsea are about more than just mode of speech. We interpret this difference as evidence of a sociolinguistic phenomenon, whereby Chelsea speakers are contracting their vowel spaces—in relation both to their generational predecessors who speak modern RP, and to their contemporaries in Essex.
Putting the three panels of Figure 2 together, we can see that the two fairly distinct vowel systems in RP and Cockney (Figure 2a) became more similar by the final decades of the twentieth century. The levelling of the difference between the systems appears to have been driven by an anticlockwise shift in the short vowels of RP, resulting in modern RP monophthongs closely approximating the position of their Cockney counterparts (Figure 2b). Following this shift, there is no evidence of further anticlockwise rotation. Instead, we find a general shrinking of the vowel space, such that the short vowels in Chelsea are all more centralized than in the modern RP speech described by Fabricius (Reference Fabricius2007) (Figure 2b), though we acknowledge that different speech modes across the datasets may amplify/exaggerate this effect. A comparison with the position of the vowels in Essex (Figure 2c), however, clarifies things, and shows that Chelsea short vowels have not moved away from their levelled Southeastern positions. Instead, the entire system is more centralized and compressed in Chelsea than in Essex, supporting the idea that a socially relevant change has taken place among Chelsea speakers. In the next section, we consider how best to account for the patterns we find in RP today.
Vowel centralization as articulatory setting
The overall shrinking of the vowel space among Chelsea speakers is reminiscent of a particular articulatory setting described by Laver (Reference Laver1980:49), “in which the centre of the mass of the tongue remains more or less in neutral position, and the segmental articulations tend not to depart radially very far from the centre of the articulatory space.” Laver termed this type of articulation lax voice, or a style of speaking associated with “lower subglottal air pressure; a slightly lowered larynx; an unconstricted pharynx … inhibited, minimized radial movements of the relaxed, relatively flat-surfaced tongue in segmental articulation; minimal activity of the lips; and a relatively immobile jaw” (Reference Laver1980:155). Laver stated, following Honikman (Reference Honikman, Abercrombie, Fry, P., Scott and Trim1964), that lax voice is typical of RP in British English. The close correspondence between Laver’s description of lax voice and the patterns found in Chelsea, combined with Honikman’s observation about the prevalence of lax voice in RP, lead us to wonder whether articulatory setting, and specifically the use of lax voice, can account for the differences we observe between Chelsea and Essex.
In the absence of physical articulatory measurements (e.g., electromagnetic articulography), we use two acoustic diagnostics to determine whether the different vowel space configurations in Chelsea versus Essex could be linked to a difference in articulatory setting. Prior research indicates that changes in articulatory setting affect vowel classes differently (Laver, Reference Laver1980; Nolan, Reference Nolan1983). We therefore consider whether the differences observed across vowel classes are consistent with the adoption of a lax voice setting in Chelsea (and its absence in Essex). The first diagnostic involves inspecting the position of the high-front, high-back, and low vowels when they are plotted using a modified version of the vowel-intrinsic Bark Difference method (Kendall & Thomas, Reference Kendall and Thomas2018; Thomas, Reference Thomas2011). These results are presented in Figure 4, which plots the average location of the 11 monophthongs split by show (Chelsea in black and Essex in gray). Triangles surround Nolan’s (Reference Nolan1983) three main vowels classes: high-front vowels (fleece, kit, and dress; solid line), high-back vowels (goose/ghoul, foot, and caught; dashed line), and low vowels (trap, nurse, strut, cot, palm; dotted line).
According to Nolan (Reference Nolan1983), the acoustic consequences of lax voice (and its accompanying lowered larynx and unconstricted pharynx) would translate in Figure 4 to an apparent lowering of high-front vowels, minimal change among high-back vowels, and raising among low vowels (see also Thomas, Reference Thomas2011). In Figure 4, we see that of the three high-front vowels (solid triangle), kit and dress are indeed substantially lowered in Chelsea (black labels) as compared to Essex (gray labels). Within the high-back area (dashed triangle), excluding goose and foot, we do not find a consistent pattern among the remaining vowels in this class, a finding that chimes with Nolan’s (Reference Nolan1983) prediction of minimal change in the high-back area. Among the low vowels (dotted triangle), we have a somewhat more consistent pattern, with substantial raising of strut, palm, and cot in Chelsea as compared to Essex, though we also find backing and slight lowering of nurse, and, to a lesser extent, trap. Though not perfectly aligned, we suggest that the results in Figure 4 are generally consistent with the proposal that Chelsea speakers are using a lax voice articulatory setting.
To test this claim further, we turn to a second diagnostic of articulatory setting: differences across vowel classes in the ratio of (non-normalized) F2 to F1. Nolan (Reference Nolan1983:190) argued that F2:F1 ratio is a more robust correlate of articulatory setting than simple F1 and F2 formant comparisons, since it is relatively independent of physiological differences among speakers. For lowered larynx as compared to a “neutral” setting, Nolan (Reference Nolan1983) reported a lower F2:F1 ratio in the high-front vowels, a higher ratio in the high-back vowels, and a higher ratio in the low vowels. We used a linear mixed-effects regression model to test this prediction in our data, with F2:F1 ratio as the outcome variable and vowel class, show, and their interaction as predictors (speaker included as a random intercept). Results indicate a significant effect of the interaction between vowel class and show (F = 37.89, p < .000). Post hoc comparisons confirm that the differences in F2:F1 ratio between Chelsea and Essex in the high-front (t = −5.362, p < .000) and high-back (t = 4.653, p < .000) regions are significant, with Chelsea showing a lower F2:F1 ratio in the high-front region and a higher ratio in the high-back one. This difference in depicted graphically in Figure 5. With the exception of the low vowels (where no difference in F2:F1 ratio for Chelsea and Essex is found), the results depicted in Figure 5 are consistent with the use of a lax voice articulatory setting among Chelsea speakers. We therefore take these results as a second piece of evidence in support of the proposal that what distinguishes Chelsea and Essex is a difference of articulatory setting.
Summarizing our results so far, we find no evidence of a further anticlockwise rotation of the short vowels in modern RP. Rather, the system appears to be in the same “levelled” supra-local configuration it was in 15 years ago (Fabricius, Reference Fabricius2007, Reference Fabricius, Calhoun, Escudero, Tabain and Warren2019). What has changed, however, is the overall size of the vowel space, with a general inward convergence of the short vowels distinguishing Chelsea from Essex. Acoustic diagnostics are generally consistent with the idea that this pattern of inward convergence is linked to the adoption of a lax voice articulatory setting among Chelsea speakers, one in which a “lowered larynx … and a relatively immobile jaw” (Laver Reference Laver1980:155) generate the centralized vowel realizations that we find. Tellingly, these acoustic and articulatory features of lax voice parallel the metapragmatic descriptions of contemporary RP, outlined previously (e.g., Kristen Stewart’s “detached jaw” and Russell Kane’s description of “squashed vowels”). Given these complementary sources of evidence, we suggest that Chelsea speakers are indeed adopting a lax voice articulatory setting and that it is the use of this setting that sets their speech apart not only from speakers in Essex but also from earlier iterations of RP.
Discussion
Embodying eliteness
The natural next question is why Chelsea speakers are doing this: why are individuals in Chelsea adopting a lax voice articulatory setting (and not individuals in Essex, for example)? In previous work (e.g., Levon & Holmes-Elliott, Reference Levon and Holmes-Elliott2024), we have argued that the answer to this question lies in the specific ethnokinesics of class in Britain, that set of ideologies that links certain bodily postures and movements with different social class positionings (Agha, Reference Agha2007:272-277). Research in sociology has argued that social class in Britain historically has been organized in terms of an ethics of restraint, such that dominant conceptualizations of “prestige” and “respectability” treat these constructs as being negatively correlated with the expression of emotion (e.g., Cannadine, Reference Cannadine1999; Lawler, Reference Lawler2005; Skeggs, Reference Skeggs1997). Recent research confirms the ongoing relevance of this ideal. In a study of elite government workers in Britain, for example, Friedman (Reference Friedman2021) described how senior civil servants are taught to enact a form of “studied neutrality,” a stance that they see as intimately tied to competence and authority (see also Ashley, Reference Ashley2021). Friedman described this ideal of neutrality as a form of embodied cultural capital, a somatic disposition that serves to legitimate civil servants’ claims to authority and prestige. In other words, Friedman argued that neutrality is a form of bodily hexis (Bourdieu, Reference Bourdieu1977), a conventionalized mode of comporting one’s self that is emblematic of elite status in British society.
This norm includes adopting a posture of stoicism, as evidenced by the high cultural value placed on tropes like the “‘stiff upper lip’ and ‘controlled excitement,’ [where] strong emotions are cultivated but always kept under control” (Bull, Reference Bull2019). It also includes the corporeal enactment of indifference. This was on display, for instance, in the case of Jacob Rees-Mogg, a Conservative Member of Parliament, who was lambasted in the press for reclining on the benches in the House of Commons during a debate about Brexit in September 2019. Anna Turley, a Member of Parliament from the opposition Labour Party, was quoted as describing Rees-Mogg’s behavior as “the physical embodiment of arrogance and entitlement” (Rawlinson, Reference Rawlinson2019). While clearly critical, comments such as Turley’s demonstrate the cultural association between embodied indifference and elite social status that exists in the UK.
Applying this cultural background to the current study, we argue that the adoption of lax voice by speakers in Chelsea functions quite literally as a physical embodiment of British ideals of eliteness. Specifically, we propose that Chelsea speakers orient to a set of qualities associated with elite status: restraint, detachment, indifference. This orientation pushes them to adopt a particular bodily posture (lax voice) that is iconically linked to these qualities. This articulatory posture, in turn, correlates with specific linguistic outcomes (compression in the lower half of the vowel space). According to this account, the positioning of the jaw and tongue among Chelsea speakers functions like any other form of bodily comportment, serving as a symbolic strategy for aligning with a culturally elite persona (Agha Reference Agha2007). Chelsea speakers strategically adopt a lax voice setting as a way of “doing” eliteness (see also Levon & Holmes-Elliott, Reference Levon and Holmes-Elliott2024; Podesva, Reference Podesva, Hall-Lew, Moore and Podesva2021; Pratt, Reference Pratt2023a, Reference Pratt2023b for a more detailed discussion of bodies, linguistic variation, and social personae).
While our argument that Chelsea speakers’ strategic embodiment is preliminary, we submit that it is consistent with the available evidence, including diachronic comparisons with prior work on RP, acoustic diagnostics of articulatory setting in Chelsea versus Essex, and metapragmatic discussions of “posh” speech in Britain today. Further, as we highlight in our introduction, we note that analogous links between specific embodied postures and elite status have been reported for multiple English varieties across North America. Once again then, we have an example of a specific bodily posture (open jaw) used to perform eliteness, resulting in a strikingly similar set of linguistic outcomes as those examined here. These outcomes are also similar to the patterns of vowel lowering described by Hickey (Reference Hickey2018) for Irish English, and Chevalier (Reference Chevalier and Hickey2019) in South African English, where both authors observed the association of these changes with elite speakers and notions of overt prestige.
More broadly, it seems plausible to see a connection between parodic representations of elite speech patterns and vectors of language change. D’Onofrio et al. (Reference D’Onofrio, Pratt and Van Hofwegen2019) argued that the most recent advancements in the Californian Vowel Shift are best understood as a compression of the vowel space, where younger speakers have significantly smaller vowel space areas than older speakers. They suggested that the diachronic patterns are a result of the vowel space as a whole acting as a carrier of social meaning (see also Pratt, Reference Pratt2023b). This account suggests that the positive social associations that characterize the compressed vowel space in California (elite), along with the attractive embodied personae it is associated with (valley girl, surfer, etc.), may motivate the adoption of this articulatory setting by young California speakers and so help to explain recent advances in the shift. Likewise, we propose that our analyses of the vowel systems of speakers in Made in Chelsea and The Only Way is Essex point to a new pattern of elite distinction emerging, one characterized not by a further anticlockwise rotation in the RP system, but rather by an overall shrinking of the vowel space among Chelsea speakers. A similar motivating force to that proposed for California could therefore be at work within Southern British English. In the British context, the association of a backed and compressed articulatory setting with eliteness could mean that it is readily adopted by young RP speakers, but rejected by their Essex counterparts (whose identity is traditionally linked to the working class Cockney speakers of London’s East End; see Cole, Reference Cole2021).
Conclusion
In his discussion of contemporary articulations of privilege, Khan (Reference Khan2011) argued that elite status today is experienced as a sense of ease, an ability to stand above the fray and be unaffected by changing circumstances or situations (see also Thurlow & Jaworski, Reference Thurlow and Jaworski2017). We suggest that this sense of ease has become enregistered in a series of embodied postures, potentially including the use of lax voice. If this were the case, it could then be possible to see the similar linguistic patterns we find in different regions—in London, Philadelphia, California, Ireland, and South Africa—as local manifestations of a shared orientation to ease as how one enacts eliteness in the present-day. To support the claim that lax voice circulates globally as an enregistered marker of elite status, further studies of local ethnokinesic systems and the link between articulatory settings and observed linguistic outcomes are needed. However, the consistent social profile of vowel centralization across different English varieties presents a compelling empirical observation, and one that offers a promising avenue for further comparative research.
Whether this further research ends up supporting our proposal that lax voice functions as a global emblem of eliteness, we hope in this article to have demonstrated that RP today is different from earlier versions of the variety. Analyses of the vowel system of speakers in Made in Chelsea demonstrate that the anticlockwise rotation of the short vowels that took place in RP over the course of the twentieth century (Fabricius, Reference Fabricius, Calhoun, Escudero, Tabain and Warren2019) has not progressed further. Instead, Chelsea speakers’ vowel space areas are smaller than those in modern RP (and in Essex). We argue that this shrinking is consistent with the adoption by Chelsea speakers of a particular articulatory setting—lax voice—which has the effect of causing short vowels to centralize and is leading to widespread overlap among vowel classes in the lower half of the vowel space.
We believe that our arguments are important for three reasons. First, they contribute to the study of RP as a sociolinguistic variety, one that is habitually used by a given population of speakers and not (or not only) an abstract standard. As Fabricius (Reference Fabricius, Braber and Jansen2018) noted, further research on variation in RP use among elite speakers is necessary if we hope to provide an adequate description of the variety, its social distribution, and the ways it may be changing (see also, inter alia, Fabricius, Reference Fabricius2000). Related to this, our analysis also contributes to the study of elite speech and the ways in which elite distinction is constructed and communicated. While sociolinguistics has traditionally been focused on describing nonstandard and other stigmatized varieties, based on the fact that doing so serves important theoretical and political purposes, we maintain that elite varieties are equally deserving of our attention since combatting sociolinguistic prejudice and hierarchy also requires understanding how those in positions of power maintain their dominant status. As Coupland (Reference Coupland2000:624) noted, “elites perpetuate elite society by being seen to be elites, and … by defining the capital value of symbols at their disposal” (see also Fabricius, Reference Fabricius, Braber and Jansen2018:37).
This then leads to the third contribution we hope to make, which is the proposal that socially meaningful variable patterns may be grounded in, and motivated by, culturally relevant forms of embodiment. In her discussion of the relationship between social meaning and sound change, Eckert (Reference Eckert2019:1) argued that “sound change spreads by virtue of its being incorporated in a system of social meaning … in which non-referential meaning is recruited into signs articulating social distinction.” We argue that, at least in certain cases, the non-referential linguistic sign can come to articulate social distinction by virtue of being tied to particular forms of bodily enactment, a product of speakers adopting embodied interactional styles in order to differentiate themselves from others in the social landscape (cf. Esposito & Gratton, Reference Esposito and Gratton2020). In this respect, we hope to have demonstrated the potential relevance of reading language through the body (Bucholtz & Hall, Reference Bucholtz, Hall and Coupland2016) and to have promoted the idea that sound change may not always (or not only) be about the sounds themselves, but also about the bodies that sounds emanate from.
Competing interests
The authors declare none.
Appendix