Introduction
Accurate neuropsychological classification of mild cognitive impairment and dementia benefits from the use of formal premorbid ability estimates to characterize the magnitude of cognitive change. Failure to employ premorbid estimates results in misclassification bias in Black older adults and can lead to findings of an apparently higher cognitive burden of Alzheimer’s disease and related dementias compared to non-Hispanic Whites (Barnes, Reference Barnes2022). The number of years of completed education is a commonly used metric for interpreting the clinical significance of cognitive test results. Since level of education is a weak proxy for premorbid cognitive functioning because of variability in educational standards/quality, alternative approaches have been developed for estimating cognitive reserve including single word reading measures such as the National Adult Reading Test (Nelson & Willison, Reference Nelson and Willison1991), Test of Premorbid Functioning (Holdnack et al., Reference Holdnack, Schoenberg, Lange, Iverson, Holdnack, Droxdick, Weiss and Iverson2013), and Wide Range Achievement Test (Wilkinson & Robertson, Reference Wilkinson and Robertson2017).
Unlike education level which is influenced by a variety of sociocultural and economic factors, single word reading reflects formal language competencies from lifelong exposure to and acquisition of written material and general cognitive skills for decoding language (Gershon et al., Reference Gershon, Slotkin, Manly, Blitz, Beaumont, Schnipke, Wallner‐Allen, Golinkoff, Gleason, Hirsh‐Pasek, Adams and Weintraub2013). In a groundbreaking study by Manly et al. (Reference Manly, Jacobs, Touradji, Small and Stern2002), the significance of word reading sensitivity in comparison to years of completed education on cognitive functioning was underscored. They found that the differences in neuropsychological test performance between Black and non-Hispanic White older adults were lessened or even eliminated when analyses controlled for reading recognition scores from the Wide Range Achievement Test, despite the groups being matched for years of education. Subsequent studies (Dotson et al., Reference Dotson, Kitner-Triolo, Evans and Zonderman2009; Fyffe et al., Reference Fyffe, Mukherjee, Barnes, Manly, Bennett and Crane2011) have also demonstrated stronger associations between reading level compared to education in predicting cognitive performance.
An alternative method for characterizing premorbid ability involves measurement of receptive vocabulary, i.e., knowledge of word meaning (e.g., Peabody Picture Vocabulary Test; Dunn, Reference Dunn2019). Carvalho et al. (Reference Carvalho, Tommet, Crane, Thomas, Claxton, Habeck, Manly and Romero2015) reported that Shipley Vocabulary Test scores eliminated racial differences in memory and executive functioning scores. Apart from this study, however, the influence of receptive vocabulary scores versus years of education on additional cognitive domains in Black compared to White older adults has not, to our knowledge, been assessed. Therefore, in the current study, we evaluated the contribution of the NIH Cognitive Toolbox Picture Vocabulary Test scores versus years of education on neuropsychological performance of Black and White healthy volunteers.
The Toolbox Picture Vocabulary Test was developed as a desktop administered measure and was subsequently adapted as an iPad administered measure of receptive vocabulary whereby an individual hears a spoken word presented via the iPad and then choses the image, among four pictured alternatives, that best depicts the meaning of the word. Gershon et al. (Reference Gershon, Cook, Mungas, Manly, Slotkin, Beaumont and Weintraub2014) provided concurrent validation of the Toolbox Picture Vocabulary Test as a receptive language measure via high correlations with the Peabody Picture Vocabulary Test in healthy controls (r = .80), and weak correlations with memory measures (r = .11 on Rey Auditory Verbal Learning Test; r = .10 with Brief Visuospatial Memory Test) in adults 20-85 years old. We hypothesized that controlling for receptive vocabulary scores would eliminate or attenuate differences in performance between Black and White older adults and would be superior to years of education, especially on verbally mediated neuropsychological measures including naming, letter and category fluency, and word list recall.
Methods
Participants were enrolled in the Emory Healthy Brain Study (EHBS), a preclinical Alzheimer’s disease biomarker discovery project designed to capture early conversion from normal age-related cognitive performance. The EHBS includes a community-based prospectively enrolled cohort of cognitively healthy participants between 50 and 75 years of age (Goetz et al., Reference Goetz, Hanfelt, John, Bergquist, Loring, Quyyumi, Clifford, Vaccarino, Goldstein, Johnson, Kuerston, Marcus, Levey and Lah2019). Participants are self-declared cognitively normal without functional limitations and are without neurological diagnoses suggesting prodromal or current degenerative disease. With the exception of schizophrenia, there is no exclusion for psychiatric conditions. This project was approved by the Emory University Institutional Review Board in accordance with the Declaration of Helsinki, and all participants provided written informed consent.
Procedures
The NIH Cognitive Toolbox Picture Vocabulary Test (Gershon et al., Reference Gershon, Slotkin, Manly, Blitz, Beaumont, Schnipke, Wallner‐Allen, Golinkoff, Gleason, Hirsh‐Pasek, Adams and Weintraub2013, Reference Gershon, Cook, Mungas, Manly, Slotkin, Beaumont and Weintraub2014) is an iPad administered measure of receptive vocabulary. Each word is presented via an audio file, and four objects, actions, or concepts are shown on the screen. The participant chooses the picture which most closely depicts the meaning of the spoken word. The measure employs an adaptive testing approach calibrated for response accuracy.
Neuropsychological measures administered in the EHBS cohort (see Goetz et al., Reference Goetz, Hanfelt, John, Bergquist, Loring, Quyyumi, Clifford, Vaccarino, Goldstein, Johnson, Kuerston, Marcus, Levey and Lah2019) include the Montreal Cognitive Assessment (MoCA), Multilingual Naming Test (MiNT), Phonemic Fluency, Category Fluency, Judgment of Line Orientation (JLO), Number Span Forward and Backward, Symbol Digits Modality Test (SDMT), Trail Making Test A&B, Rey Auditory Verbal Learning Test (RAVLT), and Rey-Osterrieth Complex Figure Test (RCFT). Self-reported depression and anxiety are measured using the eight item Patient Health Questionnaire (PHQ-8) (Kroenke et al., Reference Kroenke, Strine, Spitzer, Williams, Berry and Mokdad2009) and the seven item Generalized Anxiety Disorder Scale (GAD-7) (Spitzer et al., Reference Spitzer, Kroenke, Williams and Löwe2006).
Analyses
Analyses of variance, chi-square, and Pearson Correlations were conducted to evaluate demographic and Toolbox Picture Vocabulary Test contributions to group performances. Possible group differences in self-reported depression and anxiety were examined with analyses of variance. Analyses of covariance with race as the independent variable were conducted with either age, sex, and years of education or age, sex, and vocabulary scores as covariates to evaluate whether differences in performance between Black versus White participants were attenuated or eliminated with vocabulary scores substituted in the models. All analyses were performed via SPSS v29. A Bonferroni correction of p ≤ .003 (.05/15 neuropsychological tests) was used for statistical significance to control for a false discovery rate. Effect sizes were evaluated using partial eta squared, which measures the proportion of variance in performance accounted for by the independent variable, where η p2 ≥ .01 indicates a small effect, ≥ .06 a medium effect, and ≥ .14 a large effect (Cohen, Reference Cohen1988). Cramer’s V was used for chi-square to measure the strength between two nominal variables, where V ≥ .10 is considered a small effect, ≥ .30 a medium effect, and ≥ .50 a large effect (Cramer, Reference Cramer1946).
Results
Participants were 1,007 older adults, recruited between April 2016 and May 2023, who spoke English as their primary language and had MoCA scores ≥ 24/30 points. Race was self-identified. The groups did not significantly differ in years of age (M (SD): Black = 63.0 (6.9), White = 63.3 (6.4); p = .559, η p2 = .000). Although there were trends for both groups to differ in the proportion of females and years of education, these results had small associated effect sizes (Female: Black = 75%, White = 66%; p = .052, V = .061; Education: M (SD): Black = 16.6 (2.2), White = 17.0 (2.0); p = .017, η p2 = .006). NIH Toolbox Picture Vocabulary Test scores were significantly lower (p < .001, η p2 = .14) in Black (M = 109.2 (7.5)) than in White (118.4 (7.9)) participants. Years of education were significantly correlated with vocabulary scores in both Black (r = .34, p < .001) and White (r = .32, p < .001) participants, and the strength of the associations did not significantly differ between groups (r–z transformation; z = 0.24, p = .810). The groups did not significantly differ in PHQ-8 scores (M (SD): Black = 1.7 (2.0), White = 1.7 (2.3); p = .887, η p2 = .000) or GAD-7 scores (M (SD): Black = 1.3 (2.1), White = 1.3 (2.2); p = .807, η p2 = .000).
Table 1 shows the performance of Black and White participants on each neuropsychological measure. In Model 1 with age, sex, and years of education covaried, there was a significant main effect of Race for all tests except for Number Span Forward and Backward. Effect sizes were largest for the MiNT (η p2 = .126) and JLO (η p2 = .131). In Model 2 with the vocabulary score substituted as a covariate, the main effect of Race was no longer significant for the MoCA, Phonemic Fluency, and RAVLT total learning recall, immediate recall, and delayed recall, and RCFT immediate recall and delayed recall. Although still significantly different between Black and White participants, the effect sizes for Animal Fluency, Trails B-A, SDMT, and RCFT copy condition were all reduced, with the greatest reductions occurring for the MiNT (η p2 = .038) and JLO (η p2 = .078).
+ Montreal Cognitive Assessment (MoCA), Multilingual Naming Test (MiNT), Phonemic Fluency, Category Symbol Digits Modality Test, Rey Auditory Verbal Learning Test (RAVLT), and Rey-Osterrieth Complex Figure Test (RCFT).
++Effect sizes evaluated via partial eta squared, where .01 indicates a small effect, .06 a medium effect, and ≥.14 a large effect.
Discussion
These findings support the clinical value of using receptive vocabulary as a proxy for premorbid ability level when comparing the cognitive performance of Black and White older adults. The results extend investigations using measures of single word reading (Dotson et al., Reference Dotson, Kitner-Triolo, Evans and Zonderman2009; Fyffe et al., Reference Fyffe, Mukherjee, Barnes, Manly, Bennett and Crane2011; Manly et al., Reference Manly, Jacobs, Touradji, Small and Stern2002) to encompass measures assessing word meaning. In addition, they support the validity of using the NIH Toolbox Picture Vocabulary score to control for performance differences. To our knowledge, this subtest of the NIH Toolbox has not been previously examined in this manner. The elimination of a significant main effect of race on the MoCA when the vocabulary score was covaried is especially striking since scores on the MoCA are strongly associated with demographic variables including both education and race (Ratcliffe et al., Reference Ratcliffe, Hale, McDonald, Hewitt, Nguyen, Spencer and Loring2024). The MoCA is not only a widely used tool for classifying cognitive status as normal or impaired, it is also used as a cutoff score for determining clinical trial eligibility. The lack of racial diversity in pharmacological treatment trials has been attributed, in part, to Black participants and other minorities not meeting eligibility criteria on cognitive screening measures (Franzen et al., Reference Franzen, Smith, van den Berg, Rivera Mindt, van Bruchem-Visser, Abner and Papma2022). Thus, adjustment of the total score by a premorbid measure of ability level, as opposed to adding one point for ≤ 12 years of education as is currently done for many versions (https//mocacognition.com/faq), may provide a more meaningful eligibility score. For example, rather than using a one point adjustment for the MoCA, a regression based approach that includes a premorbid adjustment of the score based on receptive vocabulary might lead to more equitable and inclusive eligibility criteria. Such regression based approaches could also be applied to other cognitive test scores.
We hypothesized that verbally mediated measures would be most impacted after controlling for the vocabulary score. We found that performance differences between Black and White participants were eliminated or attenuated on verbally based measures such as timed phonemic fluency, verbal memory, and naming. In fact, the largest effect size reduction on verbal measures occurred on the MiNT (η p2 = .038 with vocabulary vs. η p2 = .126 with education). Stasenko et al. (Reference Stasenko, Jacobs, Salmon and Gollan2019) found a stronger effect of education on naming performance in cognitively normal Black participants than White participants enrolled in the National Alzheimer’s Coordinating Center, such that lower education was associated with worse naming scores in Blacks, indicating as well that education level may not accurately control for racial comparisons. Findings in our study were not limited to verbal measures, as there was a reduction in effect sizes additionally seen for purportedly non-verbally mediated measures such as visuoconstructional ability, visual memory, and visuospatial functioning. The reduction in effect size for these measures was especially pronounced on the JLO, from η p 2 = .131 to .079. Using a measure of reading ability (WRAT-3), Dotson et al. (Reference Dotson, Kitner-Triolo, Evans and Zonderman2009) also found that associations between race and education were eliminated for Blacks across both verbal and nonverbal measures when reading level was adjusted in analyses. Thus, controlling for premorbid ability has widespread implications for a range of neuropsychological measures. These measures such as the RCFT, for example, recruit multiple strategies entailing not only visual analysis but also planning and organization (‘executive functions’) which are sensitive to vascular cognitive impairment (Salvadori et al., Reference Salvadori, Dieci, Caffarra and Pantoni2019) and thus could be impacted by additional sources of variance.
There are several limitations to our study. Since we did not administer measures of both vocabulary and reading level to the same sample, we cannot determine whether one measure is more strongly associated with eliminating or weakening cognitive test score differences. We would predict that vocabulary as a measure of semantic knowledge would reduce effect sizes to a greater extent, and future studies should consider comparing these two measures of premorbid ability. Second, our study was limited to a single test of premorbid ability and does not take into account other complex influences that also contribute to cognitive performance differences such as additional sources of variability in quality of education and social determinants of health. Seblova et al. (Reference Seblova, Eng, Avila‐Rieger, Dworkin, Peters, Lapham, Zahodne, Chapman, Prescott, Gruenewald, Arpawong, Gatz, Jones, Glymour and Manly2023) defined education quality using indicators including dropout rates, number of teachers with graduate training, teacher salaries, term length, and school size. Of these, attendance at schools where there were more teachers who had achieved graduate training was a consistent predictor of later-life cognitive performance, especially on language measures. Social determinants of health represent another broad term that encompasses many influences such as the neighborhood and physical environment (e.g., housing, transportation, safety, walkability, parks), healthcare (coverage, availability, provider cultural competence, quality of care), and the community and social context (discrimination, social integration, support systems, community engagement). As Barnes (Reference Barnes2022) points out, these determinants represent influences over the life-course rather than static measures solely reflecting early life exposures. Another limitation of our study is the homogeneity of our sample in terms of education and vocabulary. All participants had at least a high school education (range = 12–20 years) and demonstrated average to above-average vocabulary scores (range = 86–139). To comprehensively assess the influence of education and vocabulary on cognitive test performance, future studies should include participants with a broader range of educational attainment and vocabulary levels, including those with lower education and vocabulary scores. The lack of ethnic and racial diversity of our study sample also hinders generalizability of these findings beyond Black and White older adults and remains to be determined.
In summary, receptive vocabulary as measured via the NIH Toolbox Picture Vocabulary Test attenuates or eliminates differences in cognitive performance between Black and White older adults and appears superior to using years of completed education as a proxy for premorbid functioning. The importance of controlling for premorbid ability is underscored by findings of an increased risk of Alzheimer’s disease (AD) in Black older adults which has been attributed to misclassification bias as one possible reason (Barnes, Reference Barnes2022), such that lower scores on cognitive tests in Black compared to White older adults may lead to an increased diagnosis of AD in the former group. Studies that find a comparable rate of cognitive decline (Weuve et al., Reference Weuve, Barnes, Mendes de Leon, Rajan, Beck, Aggarwal, Hebert, Bennett, Wilson and Evans2018; Wilson et al., Reference Wilson, Capuano, Sytsma, Bennett and Barnes2015), despite an earlier diagnosis of AD in Blacks, produce a paradox whereby one would expect that a greater degree of pathology in Blacks would result in a faster trajectory. However, there is controversy regarding whether misclassification bias is a sole explanation, as one study (Amariglio et al., Reference Amariglio, Buckley, Rabin, Papp, Quiroz, Mormino, Sparks, Johnson, Rentz and Sperling2020) found that rates of decline on the Preclinical Alzheimer Cognitive Composite-5 were faster in Black individuals even controlling for amyloid burden. Efforts to identify whether disparities exist should attempt to level comparisons by assuring that premorbid differences affecting cognitive test scores are accounted for as much as possible. Such attempts should not be made to obscure the detection of factors that contribute to disease risk but rather to ensure equitable and inclusive access to resources to treat disease.
Funding statement
This research was supported by funding from the National Institute of Aging (Emory Healthy Brain Study: R01-AG070937, JJ Lah, PI).
These findings were presented at the 2024 Annual Meeting of the International Neuropsychological Society.
Competing interests
None.