Introduction
After an extended period of being on the periphery, numerous advancements in the field of second language (L2) pronunciation over the past decade have led to increased activity and visibility for this subfield within applied linguistics research. As Derwing (Reference Derwing, Levis and LeVelle2010) underscored in her 2009 plenary at the first annual Pronunciation in Second Language Learning and Teaching (PSLLT) conference, a record number of graduate students researching L2 pronunciation and subsequently launching into academic positions at international universities assures L2 pronunciation a bright future in research and teacher training. Other indicators of momentum include the focus of a Language Teaching timeline on the topic of pronunciation (Munro & Derwing Reference Munro and Derwing2011), the appearance of multiple encyclopedia volumes or handbooks of pronunciation (e.g. Levis & Munro Reference Levis, Munro and Chapelle2013; Reed & Levis Reference Reed and Levis2015), and the establishment of the specialized Journal of Second Language Pronunciation in 2015, which constitutes a milestone in the professionalization of the field and ‘an essential step toward a disciplinary identity’ (Levis Reference Levis2015: 1).
These positive developments notwithstanding, the vast majority of renewed applied pronunciation research activity has been undertaken by researchers in the fields of second language acquisition (SLA), language pedagogy, sociolinguistics, and psycholinguistics. The language assessment community has been slower in its uptake of interest in pronunciation, with few advocates drawing attention to its exclusion from the collective research agenda or underscoring its marginalization as an assessment criterion in L2 speaking tests until recently (e.g. Harding Reference Harding and Chapelle2013; Purpura Reference Purpura2016). Pronunciation remains under-conceptualized in models of communicative competence/communicative language ability (Isaacs Reference Isaacs and Kunnan2014) and typically receives minimal coverage in standard texts, such as Luoma's (Reference Luoma2004) Assessing speaking from the Cambridge Language Assessment series. Although there is a dedicated book on assessing grammar and vocabulary in that series, there is none on assessing pronunciation or pragmatics. The treatment of pronunciation in Fulcher's Language Teaching timeline on assessing L2 speaking is indicative, in that it is singled out as the only area relevant to the L2 speaking construct that he was ‘not able to cover’ (2015: 201).
However, there are signs suggesting that pronunciation is also beginning to emerge as an important research area in language assessment. For example, whereas only two pronunciation-focused articles were published in the first 25 years of publication of the longest-standing language assessment journal, Language Testing (1984–2009), at least one such article per year has appeared in the years since (2010–). Assessment issues have recently been featured in major events on pronunciation teaching and learning (e.g. 2012 PSLLT invited roundtable on pronunciation assessment), while pronunciation has been featured in assessment-oriented discussions (e.g. 2013 Cambridge Centenary Speaking Symposium, which will feed into a special issue of Language Assessment Quarterly; Lim & Galaczi forthcoming). A general shift in attention in language assessment research toward pronunciation and fluency has followed the introduction of fully automated standardized L2 speaking tests. Finally, the growing use of English as a lingua franca (ELF) in diverse international contexts brought about by globalization and technological advancements has catapulted the issue of defining an appropriate pronunciation standard to the frontline of assessment concerns (e.g. Davies Reference Davies2013; Jenkins Reference Jenkins2006), with discussions extending to pronunciation norms in lingua franca contexts for languages other than English (Kennedy, Blanchet & Guénette Reference Kennedy, Blanchet, Guénette, Isaacs and Trofimovich2017). New edited volumes (Isaacs & Trofimovich Reference Kennedy, Blanchet, Guénette, Isaacs and Trofimovich2017; Kang & Ginther Reference Kang and Gintherin press) are taking stock of these developments, fusing perspectives from research communities where there has, hitherto, been little communication.
This resurgence can be seen as part of a cycle, as there have been times in the past where pronunciation was at the forefront of language teaching, learning, and assessment (Isaacs Reference Isaacs and Kunnan2014). The goal of this timeline is, therefore, to chart a clear historical trajectory of pronunciation assessment. In this, we will underscore how conceptualizations and practical implementations have evolved over time, with influences from teaching methodologies, theoretical frameworks, and seminal research that evidence (or in the case of newer pieces, have potential for) ‘historical reverberation’. Throughout, we chart how new lines of inquiry may be instigating or reinforcing change in assessment practice, establishing links where possible between work in different eras.
The starting point for this endeavour requires defining the terms ‘pronunciation’ and ‘assessment.’ In the context of this review, ‘pronunciation’ is inclusive of both segmental (individual sounds) and suprasegmental (prosodic) features, although the assessment instruments cited (e.g. rating scales) have their own operational definitions that may diverge from this. Following Bachman (Reference Bachman2004), the term ‘assessment’ refers to any systematic information gathering process used to foster an understanding of the phenomenon of interest (e.g. learners’ ability or processes). Conversely, a ‘test’ denotes a particular type of assessment in which a performance is elicited and an inference/decision is made about that performance, usually on the basis of a test score. All tests are assessments, but not all assessments are tests – although tests are the most common type of formal assessment. Because tests tend to be higher-stakes and more ubiquitous than other assessment types, they are well represented in the timeline, which includes both direct citations of assessment instruments, and the research and validation work which underpins their development and use. No timeline can be exhaustive, and English is overrepresented as the target language in the included entries.
Much of the focus of the timeline is on defining a suitable standard for assessing pronunciation (e.g. native-like accuracy vs intelligible/comprehensible speech), arriving at an adequate operational definition of pronunciation, or considering pronunciation in relation to some conception of aural-oral ability or communicative competence/communicative language ability. Although from a research perspective, the terms ‘intelligibility’ and ‘comprehensibility’ are frequently distinguished in how they are operationalized (e.g., using orthographic descriptions vs rating scales in Derwing & Munro's Reference Derwing and Munro2015 conception, although Smith & Nelson Reference Smith and Nelson1985, offer a different interpretation), these terms have not been used consistently in L2 speaking scales. The term used in the timeline is simply the one used by the author of the cited publication or assessment instrument.
Another prominent line of inquiry relates to reliability: how might pronunciation be objectively assessed? There is potential for individual differences in the characteristics of those scoring pronunciation assessments to unduly influence or bias the assessment, which raises issues of test fairness. Human raters can now be supplanted through the use of modern technology, which addresses the issue of human behavioural variability. However, machine scoring of speech is not without limitations, with automated scoring systems, as yet, only able to robustly approximate human judgments on highly controlled L2 speaking tasks that yield predictable learner output (e.g. sentence read-aloud, construction, or repetition tasks). This has raised concerns within the assessment community about the narrowing of the L2 speaking construct using automated scoring (e.g. interactional patterns not captured; tasks relatively inauthentic; Chun Reference Chun2006). Although improvements in technological capabilities offer much promise into the future, it is humans (not computers) who are relevant in the context of real-world communicative transactions. Relative to this standard, to which machine scoring will continue to be compared, there will always be limitations to what machines are able to measure and simulate (Isaacs Reference Isaacs, Tsagari and Banerjee2016).
To capture the scope of topics and sources of influence, we organized papers into one or more of a range of themes. The themes were initially devised to cover four key areas: operational assessment systems, practitioner-oriented guides, theoretical frameworks, and research studies/syntheses. However, given that peer-reviewed journal articles and other research publications constituted over two-thirds of the entries, the fourth area – research studies/syntheses – was split into three further categories: research investigating learner performance or development; research examining the role of non-linguistic factors in pronunciation assessment; and research which takes a broader view of assessment in relation to SLA or language pedagogy. The resulting themes are:
-
A: A language test or scoring system, including rating scales and automated assessments
-
B: A teaching methodology or assessment-oriented guide for language researchers and/or practitioners
-
C: A theoretical framework of language ability, knowledge, and/or processing
-
D: Research on defining or validating speech-related constructs, either as operationalized in an assessment instrument, or through investigations of human- or machine-derived linguistic measures in relation to learner performance or development
-
E: Research on the effects of non-linguistic variables (e.g. attitudes, accent familiarity, age) on speakers’ or listeners’ test/task performance or on listeners’ (raters’/examiners’) judgments of speech
-
F: Lab or classroom-based L2 research incorporating a broader notion of assessment, including studies examining the effectiveness of pedagogical interventions
Talia Isaacs is a Senior Lecturer in Applied Linguistics and TESOL at the UCL Centre for Applied Linguistics, UCL Institute of Education, University College London. Her research examines sources of variability in listeners' judgments of speech, including mapping the factors promoting/impeding efficient oral communication in rating scale descriptors. She has taught a range of applied linguistics courses, including in second language acquisition, language assessment, pedagogy and curriculum, oral communication, and research methodology. She currently serves on the executive board of the International Language Testing Association (Member-at-Large) and on the editorial boards of the Journal of Second Language Pronunciation, Language Assessment Quarterly, and Language Testing.
Luke Harding is a Senior Lecturer in the Department of Linguistics and English Language at Lancaster University. His research is mainly in the area of language testing, specifically listening assessment, pronunciation and intelligibility, and the challenges of World Englishes and ELF for language assessment. He regularly teaches on Lancaster's MA in Language Testing on courses including Issues in Language Testing, and Statistical Analyses for Language Testing. He is the test reviews editor for the journal Language Testing and is on the editorial boards of Language Assessment Quarterly and the Journal of Second Language Pronunciation.