Introduction
Dementia is a highly prevalent neurodegenerative disorder and a leading global cause of disability and mortality (Vos et al., Reference Vos2017; Nichols et al., Reference Nichols2019). It has been estimated that 46.8 million people were living with dementia worldwide in 2015, with numbers expected to rise to 131.5 million in 2050 (Alzheimer’s Disease International, 2015). Recent global research prioritization initiatives and policies aiming to reduce the burden of dementia have highlighted the importance of early and accurate diagnosis (Alzheimer’s Disease International, 2012; Shah et al., Reference Shah2016; World Health Organization, 2017). Among others, a timely diagnosis can result in earlier interventions such as prescription of acetylcholine inhibitors to maintain function, enhanced advance care planning for patients and their families, and identification of relevant agencies and support networks (Dubois et al., Reference Dubois, Padovani, Scheltens, Rossi and Dell’Agnello2016; Robinson et al., Reference Robinson, Tang and Taylor2015). To establish a diagnosis of dementia, it must be ascertained that a decline in cognition compared to previous levels of functioning has occurred. Determining whether a change in cognitive ability has taken place can be challenging in clinical practice, however, as previously obtained measures of cognition are rarely available at clinical presentation.
To remedy this problem, various assessments have been developed to estimate premorbid intelligence. Three of the most commonly used approaches are demographic regression equations (Barona et al., Reference Barona, Reynolds and Chastain1984; Wilson et al., Reference Wilson, Rosenbaum, Brown, Rourke, Whitman and Grisell1978), irregular word reading tasks (Nelson and O’Connell, Reference Nelson and O’Connell1978; Wechsler, Reference Wechsler2001; Wechsler, Reference Wechsler2011), and lexical decision-making tasks (Baddeley et al., Reference Baddeley, Emslie and Nimmo-Smith1993; Yuspeh and Vanderploeg, Reference Yuspeh and Vanderploeg2000). The first method computes an estimated intelligence quotient (IQ) based on variables such as education, geographic residence, and occupation. A major advantage of regression equations is that demographic details are independent of current levels of functioning, and therefore are inherently unaffected by cognitive decline due to dementia. In practice, however, reliability may be hampered by difficulties in obtaining accurate information from patients and/or limited access to demographic records. Furthermore, it has been suggested that regression equations tend to provide inaccurate estimates for people with an IQ outside the average range (Goldstein et al., Reference Goldstein, Gary and Levin1986; Veiel and Koopman, Reference Veiel and Koopman2001; Griffin et al., Reference Griffin, Mindt, Rankin, Ritchie and Scott2002), and can only predict approximately 50% of the variance in measured intelligence (O’Carroll, Reference O’Carroll1995).
Irregular word reading and lexical decision-making tasks, on the other hand, rely on current performance rather than self-reported information. Scores on these tasks are strongly correlated with general intelligence assessments in healthy adults (Crawford et al., Reference Crawford, Parker, Stewart, Besson and De Lacey1989a; Yuspeh and Vanderploeg, Reference Yuspeh and Vanderploeg2000). Their use as a measure of premorbid intelligence is based on the assumption that the ability to pronounce irregularly spelled words or differentiate real words from pseudo-words is relatively resistant to cognitive decline. However, although some early studies supported the view that performance on these tasks is stable in dementia (O’Carroll et al., Reference O’Carroll, Baikie and Whittick1987; Nelson and McKenna, Reference Nelson and McKenna1975; Sharpe and O’Carroll, Reference Sharpe and O’Carroll1991; Crawford et al., Reference Crawford, Parker and Besson1988a), others reported significantly lower scores in patients compared with healthy controls (Stebbins et al., Reference Stebbins, Wilson, Gilley, Bernard and Fox1990; Patterson et al., Reference Patterson, Graham and Hodges1994; O’Carroll et al., Reference O’Carroll, Prentice, Murray, van Beck, Ebmeier and Goodwin1995; Schmand et al., Reference Schmand, Geerlings, Jonker and Lindeboom1998). In addition, while good test–retest and inter-rater reliability have been demonstrated for several instruments in healthy adults (Crawford et al., Reference Crawford, Parker, Stewart, Besson and De Lacey1989a; O’Carroll, Reference O’Carroll1987; Dykiert and Deary, Reference Dykiert and Deary2013), it is unclear whether these psychometric properties extend to clinical populations. Furthermore, the use of these measures across cultures and languages has not been systematically evaluated.
Considering their widespread use in clinical settings, a better understanding of the validity of measures for estimating premorbid intelligence in dementia is vital. Accurate estimates are key to correct interpretations of scores on cognitive screening tests, and consequently accurate diagnosis of dementia. The present systematic review focuses on the following questions: (1) what assessments are currently available for the measurement of premorbid intelligence in dementia?; (2) do estimated premorbid intelligence scores on these instruments remain stable in dementia? That is, are task scores similar for healthy adults and patients in cross-sectional comparisons, and/or are patient scores constant over time?; and (3) what are the psychometric properties of the identified tools in people living with dementia? The main objective of this review is to clarify which measures of premorbid intelligence may be most suitable for assessing patients with dementia in global clinical practice.
Methods
Search strategy
The predefined protocol for this systematic review was registered with PROSPERO (CRD42019133499) and was based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (Moher et al., Reference Moher2015). Databases (EMBASE, PsycINFO, MEDLINE, CINAHL, and AMED) were searched from 1999 until May 2019 using the NICE Healthcare Databases Advanced Search. Papers were identified through Boolean operators using keywords for dementia (“dementia” OR “Alzheimer”) and premorbid intelligence (“premorbid” AND [“intelligence” OR “intellect”]) with the thesaurus “explode” function. The search was restricted to papers published in the English language. References in the selected journal papers and previous reviews were screened manually to supplement the main search methods.
Paper selection
Titles and abstracts and, where appropriate, full text of identified citations were independently screened by two authors (MJO and SL). Any disagreements were resolved by consensus and a third author (TJW) was consulted when needed. The following criteria had to be met for inclusion in the review:
-
1) Published as a journal paper or letter.
-
2) Participants had a diagnosis of any type and severity of dementia, except for dementia secondary to acquired brain injury or non-neurological disease.
-
3) Diagnosis of a dementia was determined using standardised criteria (e.g. Diagnostic and Statistical Manual of Mental Disorders, International Classification of Diseases).
-
4) Performance on an objective assessment of premorbid intelligence was a primary or secondary outcome, and its relation to diagnosis or severity of dementia was investigated with statistical analyses.
Data extraction
The following details were extracted independently by two authors (MJO and SL) from each study using a structured form: study characteristics (study design, sample size, recruitment site, and diagnostic criteria used), participant demographics (mean age, years of education, type and severity of dementia, percentage of female participants), assessment scale of premorbid intelligence (name, type, language, and scores of patient and control groups), psychometric properties in the patient group (test–retest reliability and inter-rater reliability), and results (statistically significant findings at p < 0.05, unless otherwise determined by the authors). Disease severity was based on the range and mean of Mini-Mental State Examination (MMSE) scores reported in each study, with cognitive impairment being recorded as mild (MMSE 21–26), moderate (MMSE 14–20), moderately severe (MMSE 10–14), or severe (MMSE <10) in line with NICE guidance (National Institute For Health and Care Excellence, 2011). Where clearly identified, patients with a diagnosis of mild cognitive impairment (MCI) were excluded from the reported values and outcomes. The rationale for this exclusion is that MCI is a highly heterogeneous syndrome, which in some cases progresses to dementia but can also remain stable or even reverse over time (Gauthier et al., Reference Gauthier2006). Where possible, Cohen’s d was estimated for the core comparisons by computing the difference between reported group means divided by their pooled standard deviation. The pooled standard deviation was calculated as per Cohen (Reference Cohen1988):
with SD1 and SD2 denoting the standard deviations for each group and n 1 and n 2 referring to their respective sample sizes. Effect sizes were interpreted as small (d = 0.2), medium (d = 0.5), or large (d = 0.8).
Quality assessment and data synthesis
Study quality and risk of bias were evaluated according to the AXIS tool (Downes et al., Reference Downes, Brennan, Williams and Dean2016), a checklist comprised of 20 items designed for quality assessment of observational studies. Quality was appraised according to the number of items for which a “yes” response was recorded and rated as “high quality” (15–20), “moderate quality” (8–14), or “low quality” (0–7). A list of instruments for the assessment of premorbid intelligence in dementia was generated from the selected papers. The key outcomes of the identified studies and the psychometric properties of the instruments, where available, are presented in a narrative synthesis.
Results
Study selection and characteristics
Titles and abstracts of all identified papers after removal of duplicates (n = 304) were screened, with 13 studies meeting the stipulated eligibility criteria after full-text review. A flow diagram of the identification and attrition of studies is provided in Figure 1. Study design and demographic details of participants were recorded for all studies (see Table 1). Ten of the studies applied a cross-sectional design, with three studies investigating the change in premorbid intelligence scores over time. A total of 3669 participants were included with a mean age of 71.72 years (SD ± 7.06) across the studies. Of the 1082 patients with dementia, 929 participants were diagnosed with Alzheimer’s disease, 40 participants had vascular dementia, and dementia subtype was unspecified for 113 participants. Four of the studies (30.8%) only included patients with mild dementia, five studies (38.5%) involved patients with mild to moderate dementia, and four studies (30.8%) included patients from the mild to severe range. The studies were conducted in eight different countries (Australia, Brazil, Germany, Japan, Portugal, Sweden, the UK, and the USA).
NR, Not reported.
† Only reported for patient group.
‡ Reported as age left education.
§ Reported as categorical variable.
Identified instruments
Nineteen objective measures of premorbid intelligence were identified, including revisions, parallel versions, and variants in different languages. The most commonly investigated tools were word reading tasks (47.4%), followed by lexical decision tasks (21.1%), visuospatial reasoning tasks (15.8%), demographic equations (10.5%), and a word description task (5.3%). The majority of assessments were conducted in English (47.4%) and Portuguese (31.6%), with the remaining tasks carried out in German (10.5%), Swedish (5.3%), and Japanese (5.3%). The key outcomes from cross-sectional and longitudinal studies for each of the identified instruments are summarized in Tables 2 and 3, respectively.
CCRT, Cambridge Contextual Reading Test; JART, Japanese Adult Reading Test; LDT, Lexical Decision Test; MWT-A, Mehrfachwahl-Wortschatz-Test – Version A; MWT-B, Mehrfachwahl-Wortschatz-Test – Version B; NART, National Adult Reading Test; NART-R, National Adult Reading Test – Revised; NART-SWE, Swedish National Adult Reading Test; STW, Spot-the-Word; TeLPI, Teste de Leiture de Palavaras Irregulares; WAIS-III, Wechsler Adult Intelligence Scale III; WRAT-III, Wide Range Achievement Test III – Reading subtest; WRAT-R, Wide Range Achievement Test Revised – Reading subtest; WTAR, Wechsler Test of Adult Reading
† Predicted FSIQ.
‡ Number of errors.
NART, National Adult Reading Test; WRAT-III, Wide Range Achievement Test III; WTAR, Wechsler Test of Adult Reading
Word reading
A total of nine studies investigated the performance of dementia patients on word reading tasks (see Tables 2 and 3). English assessments included the original and revised versions of the National Adult Reading Task (NART and NART-R), the Wechsler Test of Adult Reading (WTAR), version III and the revised version of the Wide Range Achievement Test (WRAT-III and WRAT-R), and the Cambridge Contextual Reading Test (CCRT). In the NART and the WTAR, participants are asked to read a list of 50 words which have irregular grapheme–phoneme correspondences (Nelson, Reference Nelson1982; Nelson and Willison, Reference Nelson and Willison1991; Wechsler, Reference Wechsler2001). The WRAT differs from these two measures through its inclusion of words following regular spelling rules (Jastak and Wilkinson, Reference Jastak and Wilkinson1984). Finally, the CCRT is comprised of the same words as the NART, but provides greater syntactic and semantic context by presenting each word within a sentence (Beardsall and Huppert, Reference Beardsall and Huppert1997). The aim of this adaptation is to facilitate recognition of the word and thereby improve task performance.
In the mild stage of dementia, no significant differences were observed between healthy adults and patients on any of these tasks in two cross-sectional studies (McCarthy et al., Reference McCarthy, Burns and Sellers2005; McFarlane et al., Reference McFarlane, Welch and Rodgers2006). Direct task comparisons indicated that the performance of patients with mild dementia was better on the CCRT than the NART, suggesting that embedding words within a sentence may improve scores in dementia patients (McFarlane et al., Reference McFarlane, Welch and Rodgers2006). When disease severity was moderate, one study reported similar scores for healthy adults and patients on the WRAT-R and WRAT-III (McCarthy et al., Reference McCarthy, Burns and Sellers2005). In contrast, performance on the NART (McGurn et al., Reference McGurn2004; McFarlane et al., Reference McFarlane, Welch and Rodgers2006), NART-R (McCarthy et al., Reference McCarthy, Burns and Sellers2005), WTAR (McFarlane et al., Reference McFarlane, Welch and Rodgers2006), and CCRT (McFarlane et al., Reference McFarlane, Welch and Rodgers2006) was significantly lower for patients with moderate dementia than control participants. A small effect size for the group differences in word reading scores was found for the NART by McFarlane et al. (Reference McFarlane, Welch and Rodgers2006), while the remaining studies observed a medium effect size on the NART, NART-R, WTAR, and CCRT.
In a longitudinal study, WRAT-III reading scores were significantly higher for control participants than patients with Alzheimer’s disease in baseline assessments, but raw scores did not decline significantly in either patients or controls over a 1-year period (Ashendorf et al., Reference Ashendorf, Jefferson, Green and Stern2009). In studies with a longer follow-up time of 3 years, however, steeper declines in performance were observed for patients than controls on the NART (Cockburn et al., Reference Cockburn, Keene, Hope and Smith2000), and lower MMSE scores were systematically associated with a greater decline on the WTAR (Weinborn et al., Reference Weinborn2018). In the latter, the authors noted that a large proportion of the recruited patients were lost to follow-up due to death (18.5%) or withdrawal (41.4%). Moreover, only 75 of the remaining 132 patients were able to complete the WTAR on follow-up, with participants who could not carry out the assessment being more cognitively impaired at baseline (Weinborn et al., Reference Weinborn2018). The degree of decline on the WTAR may therefore be larger than estimated in the assessed patient group.
In addition to these English tasks, the systematic search identified adaptations of the NART into Portuguese (TeLPI; Alves et al., Reference Alves, Simões, Martins, Freitas and Santana2013), Swedish (NART-SWE; Rolstad et al., Reference Rolstad2008), and Japanese (JART; Matsuoka et al., Reference Matsuoka, Uno, Kasai, Koyama and Kim2006). Due to language differences, the NART-SWE items consisted of loan words rather than irregular Swedish words (Rolstad et al., Reference Rolstad2008). The JART was based on Kanji characters, an ideographic script which is used to represent lexical morphemes. Many Japanese words are compounds comprised of multiple Kanji, and the pronunciation of an individual character can vary across different words. The authors propose that the JART provides a suitable adaptation of English irregular word reading tasks as it similarly requires word-specific translations from orthography to phonology (Matsuoka et al., Reference Matsuoka, Uno, Kasai, Koyama and Kim2006). For all three tasks (the TeLPI, NART-SWE, and JART), comparable task performance was observed in patient and healthy control groups (Rolstad et al., Reference Rolstad2008; Matsuoka et al., Reference Matsuoka, Uno, Kasai, Koyama and Kim2006; Alves et al., Reference Alves, Simões, Martins, Freitas and Santana2013). It should be noted that all studies focused on patients with mild dementia. Overall, these findings suggest that word reading tasks may have potential as an assessment of premorbid functioning across different languages in early dementia.
Lexical decision-making
The four studies assessing lexical decision-making tasks all focused on the Spot-the-Word (STW) task or adaptations of this instrument. In the original STW task, participants have to select which of a word pair is the real word versus a pseudo-word (Wechsler, Reference Wechsler2011). The assessment comprises 60 pairs of real words and pseudo-words of varying word length and frequency. An equivalent version of this lexical decision-making task (LDT) was developed in Portuguese by Serrao and colleagues (Reference Serrao, Brucki, Campanholo, Mansur, Nitrini and Miotto2015). Two German adaptations used the same general principle as the STW, but required participants to identify the real word among four pseudo-words rather than presenting word pairs (Mehrfachwahl-Wortschatz-Test A and B, MWT-A and MWT-B; Binkau et al., Reference Binkau, Berwig, Jänichen and Gertz2014; Hessler et al., Reference Hessler, Jahn, Kurz and Bickel2013).
In the English STW task, test scores were comparable for controls and patients with both mild and moderate dementia (McFarlane et al., Reference McFarlane, Welch and Rodgers2006). In the Portuguese LDT, mild dementia was associated with numerically lower scores than healthy adults, but this difference was not significant in general linear models (Serrao et al., Reference Serrao, Brucki, Campanholo, Mansur, Nitrini and Miotto2015). For the German adaptations of the task, on the other hand, scores were significantly lower for patients than controls on both the MWT-A (Binkau et al., Reference Binkau, Berwig, Jänichen and Gertz2014) and MWT-B (Hessler et al., Reference Hessler, Jahn, Kurz and Bickel2013). Impaired performance was observed across the entire spectrum of disease severity, from mild to severe dementia. In addition, effect sizes for these differences were estimated to be medium to large.
Demographic regression equations
The Barona Index (Barona et al., Reference Barona, Reynolds and Chastain1984) and a demographic regression equation based on Crawford and colleagues’ (Reference Crawford, Stewart, Cochrane, Foulds, Besson and Parker1989b) work were assessed in two cross-sectional studies (McCarthy et al., Reference McCarthy, Burns and Sellers2005; McFarlane et al., Reference McFarlane, Welch and Rodgers2006). The Barona Index is based on age, sex, race, education, occupation, and geographical residence, whereas the latter equation includes the variables age, total years of education, and social class. In both studies, estimated premorbid IQ scores were similar for the patient and control groups. As would be expected given the task’s reliance on stable demographic characteristics as opposed to current performance, results were similar across patients with mild and moderate cognitive impairments.
Other assessments
Three less frequently used assessments of premorbid intelligence were identified in the present review. First, a Portuguese version of the Vocabulary subtask of the Wechsler Adult Intelligence Scale III (WAIS-III) was investigated in two cross-sectional studies (Serrao et al., Reference Serrao, Brucki, Campanholo, Mansur, Nitrini and Miotto2015; De Oliveira et al., Reference De Oliveira, Nitrini, Yassuda and Brucki2014). In this task, participants are asked to provide definitions of a list of words. The two studies demonstrated that Vocabulary task performance was similar in patients with mild dementia and controls with either normal (Serrao et al., Reference Serrao, Brucki, Campanholo, Mansur, Nitrini and Miotto2015) or low levels of education (De Oliveira et al., Reference De Oliveira, Nitrini, Yassuda and Brucki2014).
Finally, the remaining two assessments focused on performance in the visuospatial domain rather than verbal abilities. Two studies conducted in Portuguese (Serrao et al., Reference Serrao, Brucki, Campanholo, Mansur, Nitrini and Miotto2015; De Oliveira et al., Reference De Oliveira, Nitrini, Yassuda and Brucki2014) considered participants’ scores on a Matrix Reasoning task derived from the Wechsler Adult Intelligence Scale III (WAIS-III) (Wechsler, Reference Wechsler1997). In addition, one of these studies assessed people’s performance on a Block Design task from the WAIS-III (De Oliveira et al., Reference De Oliveira, Nitrini, Yassuda and Brucki2014). Both Matrix Reasoning and Block Design draw on visuospatial problem-solving skills rather than knowledge acquired through past learning. It was found that participants with mild dementia scored significantly lower on all of these tasks compared with healthy individuals (Serrao et al., Reference Serrao, Brucki, Campanholo, Mansur, Nitrini and Miotto2015; De Oliveira et al., Reference De Oliveira, Nitrini, Yassuda and Brucki2014). The effect sizes of these group differences were large on all visuospatial tasks. There was thus no evidence to support the use of perceptual problem-solving tasks to estimate premorbid IQ in dementia.
Psychometric properties
Of the 13 studies evaluated here, only 1 reported reliability measures within the dementia patient group. Ashendorf and colleagues (Reference Ashendorf, Jefferson, Green and Stern2009) found that test–retest reliability for the irregular word reading WRAT-III task was .90 in the subgroup with Alzheimer’s disease, indicating high stability of test scores across multiple measurements. No further statistics for test–retest or inter-rater reliability were provided for any of the other measures.
Quality assessment and risk of bias
All included studies were deemed to be of moderate (n = 11) to high quality (n = 2) as assessed with the AXIS tool (see Table 4). The criteria least frequently met were justification for the sample size (n = 13), representative participant selection (n = 13), and addressing and categorizing nonresponders (n = 12). These findings suggest that there was a risk of selection and nonresponse bias in the majority of the studies reported here.
Discussion
The main aims of the present review were to identify and evaluate instruments for estimating premorbid intelligence in people living with dementia. Our findings suggest that while a wide range of tools has been assessed for this purpose, evidence for their validity in patients with dementia is rather mixed. Furthermore, the lack of reliability testing across studies highlights the need for further information regarding the psychometric properties of the identified instruments. We will discuss the core findings and their implications for the assessment and diagnosis of dementia in clinical practice, and propose several directions for future research.
Stability of verbal task performance in early dementia
Of the 19 tools for estimating premorbid intelligence evaluated here, the vast majority consisted of verbal assessments. While a number of studies indicated that performance on word reading, lexical decision-making, or vocabulary tasks was unaffected by a diagnosis and/or severity of dementia, others reported significant differences between scores of healthy adults and patients groups or declining scores over time.
Word reading tasks were most frequently investigated, with a total of nine different instruments being identified. However, it should be noted that the majority of the findings on word reading performance stem from only two studies, which both examined several different tasks (McCarthy et al., Reference McCarthy, Burns and Sellers2005; McFarlane et al., Reference McFarlane, Welch and Rodgers2006). McCarthy et al. (Reference McCarthy, Burns and Sellers2005) found that performance on the NART-R was reduced in moderate but not mild dementia, whereas there was no evidence for impairment on the WRAT-III and WRAT-R in either mild or moderate dementia. McFarlane and colleagues (Reference McFarlane, Welch and Rodgers2006) indicated that the performance of patients with moderate (though not mild) dementia was impaired on three different measures: the NART, WTAR, and CCRT. The first explanation for the diverging findings could be that there were differences in sampling method or participant characteristics across the two studies. For example, in the study by McFarlane et al. (Reference McFarlane, Welch and Rodgers2006), participants with moderate dementia on average reported significantly fewer years of education than those with mild dementia and healthy controls. Furthermore, the total years of education were numerically higher for patients in the study by McCarthy and colleagues (Reference McCarthy, Burns and Sellers2005). However, McFarlane et al. (Reference McFarlane, Welch and Rodgers2006) highlight that inclusion of education in the statistical analyses did not alter the pattern of results. The second possible reason for these conflicting findings is that, while some verbal abilities are presumed to be relatively resistant to dementia, it is improbable that they are entirely impervious to the condition. At the more severe end of the spectrum, we might therefore observe greater difficulties in completing verbal tasks. In line with this hypothesis, high performance on word reading tasks tended to be maintained in mild dementia, whereas patients with moderate cognitive impairments scored lower than controls on the NART, NART-R, WTAR, and CCRT (McCarthy et al., Reference McCarthy, Burns and Sellers2005; McFarlane et al., Reference McFarlane, Welch and Rodgers2006; McGurn et al., Reference McGurn2004). In the two studies investigating Vocabulary tasks, no differences between patients and healthy adults were found (De Oliveira et al., Reference De Oliveira, Nitrini, Yassuda and Brucki2014; Serrao et al., Reference Serrao, Brucki, Campanholo, Mansur, Nitrini and Miotto2015). Importantly, only individuals with mild dementia were included in these studies, leaving open the question of whether the performance would be similarly preserved in patients with more severe cognitive impairments. Further testing of verbal tasks across the full range of disease severity is therefore essential to determine whether such assessments are suitable beyond the early stages of dementia.
Finally, in addition to the potential impact of disease severity on task performance, we posit that variability in task demands may contribute to diverging results across verbal tasks. For example, while scores on tasks such as the NART, NART-R, WTAR, and CCRT were affected in moderate dementia and declined over time (McCarthy et al., Reference McCarthy, Burns and Sellers2005; McFarlane et al., Reference McFarlane, Welch and Rodgers2006; Cockburn et al., Reference Cockburn, Keene, Hope and Smith2000; Weinborn et al., Reference Weinborn2018), performance on the WRAT-III and WRAT-R appeared to be relatively stable (McCarthy et al., Reference McCarthy, Burns and Sellers2005; Ashendorf et al., Reference Ashendorf, Jefferson, Green and Stern2009). Interestingly, the WRAT-III and WRAT-R are the only word reading tasks which include phonetically regular items. As mistakes are presumably more likely to be made in the pronunciation of irregular than regular words, the WRAT instruments might be easier and less sensitive to cognitive impairment compared to tasks which are comprised exclusively of irregular words. In lexical decision tasks, we hypothesize that a similar effect of difficulty may be responsible for inconsistent findings across tasks. Specifically, whereas patient and control scores were similar in the original and Portuguese version of the STW task (McFarlane et al., Reference McFarlane, Welch and Rodgers2006; Serrao et al., Reference Serrao, Brucki, Campanholo, Mansur, Nitrini and Miotto2015), significant impairments were observed even in mild dementia on the German MWT-A and MWT-B (Binkau et al., Reference Binkau, Berwig, Jänichen and Gertz2014; Hessler et al., Reference Hessler, Jahn, Kurz and Bickel2013). Although this difference could be due to cultural or linguistic differences between the task versions, it is not clear why this should particularly affect German but not Portuguese adaptations. We propose that an alternative explanation could be that, whereas the English and Portuguese tasks asked participants to choose the real word from a word pair, the German versions required individuals to select the real word from a total of five words. The greater number of options likely increased the cognitive demands of the task, as well as reduced the chance of guessing words correctly.
Taken together, these findings are thus suggestive of an influence of disease severity and specific task demands on verbal premorbid intelligence scores in people living with dementia. In future research, it would be worth testing this hypothesis explicitly by (1) including participants with a wide range of scores on the MMSE or similar screening measures and (2) directly contrasting tasks which are based on a similar approach but may vary in difficulty, such as the NART and the WRAT. Crucially, in studies reporting significant differences between patients and controls on verbal tasks, the effect size tended to be medium to large. This suggests that inappropriate use of these tasks could lead to substantial underestimation of prior cognitive function, which would hamper the interpretation of neuropsychological assessments and consequently accurate diagnosis of dementia. Clarifying the cause of differences in performance across verbal tasks and establishing more firmly whether these measures are valid only in early dementia is therefore a critical next step in optimizing assessments of premorbid intelligence in patient groups.
Impact of language differences
Cultural and linguistic differences between the populations and tasks should also be taken into account when interpreting findings across studies. In this review, translations of English verbal tasks into German, Portuguese, Swedish, and Japanese were identified. For some of the word reading tasks, translating instruments which were originally developed in English was a nontrivial issue. Specifically, an instrument developed in Sweden had to rely on loan words due to an absence of irregular words (which form the basis of the NART, WTAR, and WRAT) in the Swedish language (Rolstad et al., Reference Rolstad2008), and a Japanese adaptation used a different writing system (Kanji) (Matsuoka et al., Reference Matsuoka, Uno, Kasai, Koyama and Kim2006). As a consequence, it is possible that various translations of word reading tasks relied on different cognitive processes compared to the original English versions. For example, Kanji characters are perceptually highly complex and tend to have fewer phonemic factors than English written words. As a consequence, reading Kanji by guessing is difficult when the reader is not familiar with the word, and may rely on semantic processing to a greater extent than the reading of English irregular words (Matsuoka et al., Reference Matsuoka, Uno, Kasai, Koyama and Kim2006). Nevertheless, the absence of group differences between patients and healthy adults in any of these adaptations is encouraging, and suggests that such word reading tasks may hold promise as a measure of premorbid intelligence across a range of languages.
Nonverbal measures
Overall, there is thus preliminary evidence that language-based assessments may be suitable for estimating premorbid intelligence in dementia, although further research is needed to clarify the effects of disease severity and specific task differences. In addition, it should be noted that such measures are likely to be of limited use for people presenting with language variants of dementia (e.g. semantic dementia or primary progressive aphasia), as well as learning difficulties such as dyslexia. It is therefore worth considering the use of alternative, nonlinguistic assessments. However, research on such measures to date appears to be very limited. While two studies investigated patients’ performance on visuospatial reasoning tasks (De Oliveira et al., Reference De Oliveira, Nitrini, Yassuda and Brucki2014; Serrao et al., Reference Serrao, Brucki, Campanholo, Mansur, Nitrini and Miotto2015), the authors highlight that these tasks were specifically included to demonstrate the deterioration of fluid intelligence in dementia compared with “crystallised” abilities such as lexical tasks. It was therefore unsurprising that impaired performance on these tasks was observed in people with dementia. In the future, it may be of interest to explore whether there are visual abilities that tend to be preserved in dementia and are good predictors of intelligence which can be exploited to devise suitable assessments for people with language difficulties.
Alternatively, it would be possible to utilize demographic equations, which are entirely independent of people’s current performance. Preliminary findings suggest that such equations tend to result in similar estimates of premorbid IQ in healthy adults and people with dementia (McCarthy et al., Reference McCarthy, Burns and Sellers2005; McFarlane et al., Reference McFarlane, Welch and Rodgers2006). However, as only two studies were identified in the present review which used this method, we cannot make any strong claims regarding their global utility in the diagnosis of dementia. Furthermore, concerns raised in previous studies regarding limitations of this approach in accurately estimating high and low ranges of IQ (Goldstein et al., Reference Goldstein, Gary and Levin1986; Griffin et al., Reference Griffin, Mindt, Rankin, Ritchie and Scott2002; Veiel and Koopman, Reference Veiel and Koopman2001) remain to be addressed.
Implications and recommendations for clinical practice
As studies have rarely performed direct comparisons of the different measures presented here, an outstanding question is which of the various tasks is most suitable for application in people living with dementia. One study which contrasted performance on the NART, WTAR, CCRT, and STW suggested that the lexical decision-making STW task was the only measure on which no significant differences between groups were observed (McFarlane et al., Reference McFarlane, Welch and Rodgers2006). The word reading tasks were all found to result in lower scores in people with mild Alzheimer’s disease compared to healthy controls, although embedding the words within sentences (as in the CCRT) was associated with better performance in the mild patient group than presenting a list of words (as in the NART). However, given that no other studies identified in this review have investigated the English STW, replication is needed to confirm the superiority of the lexical decision-making task over irregular word reading measures in estimating premorbid intelligence. Further comparisons of word reading tasks were conducted by McCarthy and colleagues (Reference McCarthy, Burns and Sellers2005), who investigated differences in scores on the NART-R, WRAT-R, and WRAT-III as well as estimates derived from the Barona demographic equation. While the authors suggest that all four measures showed similar stability relative to Full-Scale IQ scores obtained from the Wechsler Adult Intelligence Scale – Revised (WAIS-R), no statistical analyses were conducted to directly compare the performance of the individual instruments. Additional studies contrasting the efficacy of different measures are therefore needed to identify the most suitable measure for assessments of dementia.
For clinical practice, the utility of different tools not only depends on their accuracy in estimating cognitive decline, but also the resources it requires in terms of financial costs, time, and expertise. From a pragmatic point of view, versions of the language-based NART have been investigated most extensively and are currently being used by many health professionals, with the NART-R having recently been re-standardized for the WAIS-IV (Bright et al., Reference Bright, Hale, Gooch, Myhill and van der Linde2018). The NART-R may thus represent an up-to-date measure which can easily be implemented in clinical settings. However, as some cross-sectional and longitudinal studies have disputed the stability of NART scores in dementia (Cockburn et al., Reference Cockburn, Keene, Hope and Smith2000; McFarlane et al., Reference McFarlane, Welch and Rodgers2006), task performance should be interpreted with caution. In particular, the NART may not be appropriate when assessing patients with moderately impaired cognition, as a number of studies indicated that NART performance may be affected when dementia has progressed beyond the mild stage. As the specific advantages and limitations of the reviewed word reading tasks remain to be established, we propose that other sources of information should ideally be taken into account when assessing premorbid functioning. One potentially promising approach may be to combine several tools in order to increase confidence in estimations of premorbid intelligence. For instance, demographic equations which are independent of current abilities and take little time to complete could likely complement word reading assessments. This method may offer a sensible provisional solution for clinical settings while further evidence for the validity of currently available measures is being acquired.
Methodological considerations
There are several methodological limitations which should be considered in relation to both the present review and the included papers. First, we were unable to carry out a meta-analysis to formally compare findings from the different studies due to the heterogeneity of tasks and populations assessed. In addition, most of the studies evaluated here applied a cross-sectional design, which can only offer limited insight into the presence and rate of decline on specific tasks associated with dementia. Moreover, the longitudinal studies included tended to focus on people who had already been diagnosed with dementia at the time of the first assessment. While such research can provide information regarding performance changes with the progression of dementia, it does not capture performance prior to disease onset. Additional longitudinal studies, particularly those following participants before onset of dementia, would be useful for improving our understanding of the validity of different instruments for measuring premorbid intelligence.
A strength of this review is the inclusion of both English and translated versions of premorbid intelligence assessments, which were investigated in eight different countries. However, nearly all of the included studies were conducted in countries with strong economies and educational systems. It was previously demonstrated that education is highly predictive of word reading, lexical decision, and vocabulary scores (Crawford et al., Reference Crawford, Stewart, Garthwaite, Parker and Besson1988b; Kosmidis et al., Reference Kosmidis, Taspkini and Folia2006; Starr et al., Reference Starr, Whalley, Inch and Shering1992; Walker et al., Reference Walker, Batchelor and Shores2009), and it has been suggested that reading scores can be used as a proxy for quality of education (Manly et al., Reference Manly, Jacobs, Touradji, Small and Stern2002). The validity of using verbal tasks to assess premorbid intelligence in dementia, however, has rarely been investigated in countries with fewer socioeconomic or educational resources. Only one study in the present review, which was based in Brazil, focused on participants with low levels of education (De Oliveira et al., Reference De Oliveira, Nitrini, Yassuda and Brucki2014). Here, it was found that there were no significant differences between healthy adults and patients with Alzheimer’s disease on a Vocabulary task. An untested possibility, however, is that these verbal tasks may underestimate cognitive abilities which are less strongly associated with educational background. Furthermore, it is unclear whether other, more frequently employed task types (e.g. irregular word reading) are useful for estimating premorbid intelligence in individuals with limited access to high-quality education. There is thus a clear need for more extensive testing of premorbid intelligence measures in low resource countries.
In addition, illiteracy presents a particularly important issue for the use of word reading assessments, as this inherently prevents accurate task performance even if cognition is unimpaired. According to recent estimates, approximately 750 million adults worldwide lack basic reading and writing skills (UNESCO Institute for Statistics, 2017). Exclusively relying on reading tasks as a measure of premorbid intelligence could therefore negatively affect many people across the world. In the broader neuropsychometry literature, some measures focusing on nonverbal skills have specifically been developed to measure intelligence in adults with low literacy (e.g. Ryan et al., Reference Ryan, Byrd, Mindt, Rausch and Morgello2008). However, such instruments are scarce and have not yet been validated in patients with dementia. As an alternative, it has been suggested that informant-based questionnaires may be useful for estimating premorbid intelligence in individuals with low educational levels (Apolinario et al., Reference Apolinario2013). A drawback of this approach is that the estimated abilities are dependent on the accuracy of the information provided by the informant. As well as re-evaluating existing instruments across countries, it would therefore be valuable to develop novel objective measures which are less dependent on education and literacy.
Assessments of study quality and risk of bias indicated that the majority of studies did not satisfy criteria for sample size justification, participant selection, and nonresponse. It is therefore possible that results were influenced by lack of power and/or a selection bias. This particularly complicates interpreting findings of instruments which were only assessed in one study. Finally, in addition to assessing the validity of premorbid intelligence tasks in dementia, this review set out to collate information regarding the psychometric properties of available instruments. However, only one of the studies reviewed here, which investigated performance on the WRAT-III, reported test–retest reliability within the dementia patient group (Ashendorf et al., Reference Ashendorf, Jefferson, Green and Stern2009). Reliability testing for other frequently employed tasks is therefore urgently needed.
Conclusion
Early detection and treatment of dementia are highly dependent on accurate information regarding premorbid functioning. The studies reviewed here demonstrate that there is a large number of tasks available for estimating premorbid intelligence, which are predominantly language-based. These verbal tasks appear to hold some promise for the assessment of people with mild dementia, but maybe unsuitable for individuals presenting with more severe cognitive impairments. Conclusions are limited by the fact that few tools have been investigated across multiple studies and direct comparisons of different instruments are rare. In addition, while there is some evidence supporting the use of verbal assessments across different languages, more extensive testing is needed to determine whether such measures are suitable for use in countries with lower socioeconomic and educational resources. We propose that, in clinical practice, it may be sensible to combine tools based on different mechanisms (e.g. word reading and demographic equations) in order to improve estimates of intelligence. In addition, longitudinal studies contrasting different measures would be valuable to confirm the validity of premorbid intelligence measures, and could thereby contribute to enhancing diagnostic procedures for people living with dementia worldwide.
Conflict of interest
None.
Description of authors’ roles
M. Overman designed the study, collected and analyzed the data, and wrote the paper. S. Leeworthy assisted with data collection and analysis. T. Welsh was responsible for the supervision of the study and revising drafts of the paper.