Introduction
Verbal fluency (VF) involves expressive language abilities, storage of language knowledge, and executive functions. VF tests are typically brief assessment instruments that permit the evaluation of these cognitive processes with simple administration and scoring procedures (Lezak et al., Reference Lezak, Howieson, Bigler and Tranel2012; Lezak et al., Reference Lezak, Howieson, Bigler and Tranel2012; Pekkala, Reference Pekkala and Faust2012; Strauss et al., Reference Strauss, Sherman, Spreen and Spreen2006). Due to their high sensitivity to neurological damage, they are widely used in clinical evaluation and research in various areas such as neuropsychology, speech therapy, linguistics, and medicine (see for example: Catani et al., Reference Catani, Mesulam, Jakobsen, Malik, Martersteck, Wieneke, Thompson, Thiebaut de Schotten, Dell'Acqua, Weintraub and Rogalski2013; Faroqi-Shah & Milman, Reference Faroqi-Shah and Milman2018; Herbert et al., Reference Herbert, Brookes, Markus and Morris2014).
Both in clinical practice and in research, local normative data are essential for the correct interpretation of the data obtained by a person in one or several neuropsychological test/s, no matter whether the objective is diagnosis or elaboration of a judgment about their cognitive state. In the clinical setting, a patient is compared with a group of people who have the same sociodemographic characteristics that he/she has. If the main purpose is this and it is necessary to choose some reference standards, the first ones that should be rejected are those that have been developed with the smallest sample size (Mitrushina et al., Reference Mitrushina, Boone, Razani and D'Elia2005). Those that were developed perhaps more than a decade ago, or that may be biased by cultural differences, or by other factors (low educational level, etc.) should be rejected too. Normative studies provide what has been called "clinical comparison data" (Mitrushina et al., Reference Mitrushina, Boone, Razani and D'Elia2005), which represents the range of performance on a test of different groups characterized by medical, psychiatric, and/or neurological criteria, who present homogeneous demographic traits.
Previous initiatives have provided norms for phonological fluency (PF) tests in different populations of Spanish-speaking natives from Spain: Peña-Casanova et al., (2009) for community-dwelling cognitively normal adults (n = 346), ranging from 50 to 94 years of age, used the letters P, M, and R; Casals-Coll et al. (Reference Casals-Coll, Sánchez-Benavides, Quintana, Manero, Rognoni, Calvo, Mundi, Caltagirone and Peña-Casanova2013) for a younger population, adults between 18 and 49 years old (n = 179), used the letters P, M, and R; and Lubrini et al. (Reference Lubrini, Periáñez, Laseca-Zaballa, Bernabéu-Brotons and Ríos-Lago2022) that assessed participants from 17 to 100 years old (n = 257), utilized the letters F, A, and S. If we consider what has been mentioned before about the choice of reference normative data, it is evident that the studies carried out with a small sample (Casal-Coll et al., and Lubrini et al.) would be the first to discard. Considering the date of publication, the work by Peña-Casanova et al., is 14 years old, so it could be considered outdated. Furthermore, it is important to note that in Peña-Casanova et al., study, they used an overlapping strategy to distribute their sample (n = 346) across 10 groups aged from 55 to more than 80 years, thus artificially increasing the number of cases in each age range. It is evident that updated norms with larger samples are necessary for both Spanish young and older adult populations.
According to the statistical data of the National Institute of Statistics (INE) of Spain, elderly people now represent approximately 19.5% of the total population. The mean age of the population stands at 43.81 years when in 1970 it was 32.7. According to the INE projection, in 2035 there could be more than 12.8 million older people, 26.5% of the total population (Pérez et al., Reference Pérez, Ramiro, Aceituno, Muñoz Díaz, Bueno, Ruiz-Santacruz and Villuendas2022). The need for valid and accurate normative data is especially important for older people, as this group is at special risk of cognitive impairment or dementia. As a result, it is necessary to develop global programs and increase resources focused on promoting prevention and early diagnosis. In this context, normative studies conducted by the SCAND initiative (www.scandcognition.org), such as the one presented here for PF with a sample of middle-aged and older adults, make sense.
Due to its importance, it is necessary to mention other precedents in the international sphere. For example, the Neuropsychological Norms for the US-Mexico Border Region in Spanish (NP-NUMBRS) project (Rivera Mindt et al., Reference Rivera Mindt, Marquine, Aghvinian, Paredes, Kamalyan, Suárez and Cherner2021), and specifically, for its relation to the present study, the normative data published on VF (Marquine et al., Reference Marquine, Morlett Paredes, Madriaga, Blumstein, Umlauf, Kamalyan and Cherner2021). It should also be noted the Mayo Clinic initiative that started in the 90s (Ivnik et al., Reference Ivnik, Malec, Smith, Tangalos and Petersen1996) and has provided norms for the most important neuropsychological test in different populations (see for example: Lucas et al., Reference Lucas, Ivnik, Smith, Ferman, Willis, Petersen and Graff-Radford2005).
Numerous studies have pointed out the influence of sociodemographic variables as possible moderators of VF, with age, gender, and education being the most studied (Henry & Phillips, Reference Henry and Phillips2006; López-Higes et al., Reference López-Higes, Rubio-Valdehita, Fernández-Blázquez, Lojo-Seoane, Ávila-Villanueva, Montenegro-Peña, Mallo and Delgado-Losada2022; Mathuranath et al., Reference Mathuranath, George, Cherian, Alexander, Sarma and Sarma2003). Some authors have reported that men perform worse than women in PF tasks (Loonstra et al., Reference Loonstra, Tarlow and Sellers2001), whereas many studies have not shown such differences (Costa et al., Reference Costa, Bagoj, Monaco, Zabberoni, De Rosa, Papantonio, Mundi, Caltagirone and Carlesimo2014; Khalil, Reference Khalil2010; Kozora & Cullum, Reference Kozora and Cullum1995; Peña-Casanova et al., 2009; Tombaugh et al., Reference Tombaugh, Kozak and Rees1999). Some studies have found the different effects of age when comparing semantic fluency (SF) and PF (Santos Nogueira et al., Reference Santos Nogueira, Azevedo Reis and Vieira2016; Tombaugh et al., Reference Tombaugh, Kozak and Rees1999), whereas others did not (Khalil, Reference Khalil2010; Loonstra et al., Reference Loonstra, Tarlow and Sellers2001). However, the impact of the educational level on the performance of VF tasks has been widely recognized in neuropsychological research (Lubrini et al., Reference Lubrini, Periáñez, Laseca-Zaballa, Bernabéu-Brotons and Ríos-Lago2022; Peña-Casanova et al., 2009). It has been found that higher education levels are associated with the production of more words (Oberg & Ramírez, Reference Oberg and Ramírez2006). The significant effect of educational level on PF could be related to the fact that these tasks are more demanding than semantic tasks and more sensitive to executive dysfunction (Shores et al., Reference Shores, Carstairs and Crawford2006). Formal education may increase vocabulary and consequently greater verbal lexical retrieval capacity (Henry & Phillips, Reference Henry and Phillips2006). In fact, education is the major factor that contributes to the performance in PF.
The performance on a PF test is usually evaluated by the total number of correct words given within the time limit (Lezak et al., Reference Lezak, Howieson, Bigler and Tranel2012; Pekkala, Reference Pekkala and Faust2012; Strauss et al., Reference Strauss, Sherman, Spreen and Spreen2006). However, this score provides little information about the cognitive processes underlying fluency performance, thus some authors have proposed additional measures, such as the error types in PF, which typically include perseverations (repetition of the same correct word) and intrusions (words with another initial letter) (Thiele et al., Reference Thiele, Quinting and Stenneken2016). Although errors are relatively rare in normative data, investigating the number and types of errors is useful in research and clinical practice as it is not only the decline in response counts that indicates pathology, but also alterations in performance patterns.
Research suggests that different mechanisms probably underlie perseverations and intrusions. Perseverations have been linked to a frontal lobe dysfunction characterized by intellectual rigidity and inability to shift mental sets (Miller & Cohen, Reference Miller and Cohen2001) which is common in neurologic disorders. By recording the number of perseveration errors, one might have more information about the status of the central executive component of working memory.
Intrusions on a PF task could be executive errors (e.g., forgetting the rules, losing the set, and using rules from a different fluency trial; McDowd et al., Reference McDowd, Hoffman, Rozek, Lyons, Pahwa, Burns and Kemper2011) or phonetic/spelling related (a word with a similar sound but incorrect letter; Rofes et al., Reference Rofes, De Aguiar, Ficek, Wendt, Webster and Tsapkini2019). These errors have been studied in different pathologies such as brain injury, Parkinson’s disease, or Alzheimer’s disease (Smith et al., Reference Smith, Smith and Juengst2020), suggesting that intrusions are typical of senile dementia of the Alzheimer’s type and may help distinguish it from other causes of dementia. Regarding the clinical usefulness of the error patterns analysis, it is important to remember that it is better to interpret them within the context of the whole neuropsychological assessment.
In some studies, time has also been considered. In these cases, the number of words generated during four segments of 15 seconds each has been evaluated. In general, participants produce most of the words in the early stages of the task (first 30 seconds) using a semi-automatic rapid retrieval process. Cognitive demand is not uniform throughout the PF task since, as time progresses, lexical retrieval becomes more difficult, and therefore fewer words are produced in the final moments of the task (Venegas & Mansur, Reference Venegas and Mansur2011). Furthermore, it has been found that the educational level of the participants has a positive effect on the first-time segments in both PF and SF, while age was not significant (Venegas & Mansur, Reference Venegas and Mansur2011).
Several studies in languages other than English have aimed to adapt their letter sets so they might pose a similar difficulty to “F, A, and S” in English. For instance, the “P, M, and R” set has been proposed as more appropriate to be used with Spanish speakers for several reasons (Fortuny et al., Reference Fortuny, Heaton and Hermosillo1998). First, words beginning with “F” are rare in Spanish. Although words beginning with “A” are common, the starting "HA" (as in "hábito", habit) is also very frequent, which may be disadvantageous for people with low levels of literacy since the letter H is silent in Spanish. Finally, the “S” sound may be confusing in some regions of Spain and Latin America ("C" in sequences "CE/"CI", -as in "celebración”, celebration-, and "Z" in sequences “ZA/ZO/ZU”, -as in "zapato", shoe-, are pronounced like "S"), which again poses a disadvantage for people with low literacy.
The availability of normative data on PF, adjusted for age, education, or sex, can help in the early detection of cognitive impairment and the measurement of clinically significant changes.
The main purpose of this paper is to offer updated normative data of PF (P, M, R, and P + M + R) for Spanish-speaking middle-aged and older adults natives from Spain (over 50 years old), considering sociodemographic factors (age, education, and sex), which have been elaborated with a large sample. Instead of providing only normative data for a single outcome measure (the total of words evoked) as other previous studies did, a novel contribution of the present work is the inclusion of both errors and word production during 15-second segments as additional measures for analyses.
Method
Participants
Participants were selected using the following inclusion criteria: (1) community-dwelling individuals; (2) over 50 years of age; (3) Mini-Mental State Exam (MMSE; Lobo et al., Reference Lobo, Saz, Marcos, Día, de la Cámara, Ventura, Asín, Pascual, Montañés and Aznar1999) greater or equal to 24 points; (4) Geriatric Depression Scale of 15 items version (GDS-15; Yesavage et al., Reference Yesavage, Brink, Rose, Lum, Huang, Adey and Leirer1983) below or equal to 9 points; (5) normal cognitive development, not meeting diagnostic criteria for Mild Cognitive Impairment (MCI) (Petersen, Reference Petersen2004) in at least two previous consecutive assessments; (6) being able to manage an independent life without any severe mental disorder (cognitive or psychiatric) impeding daily functioning; (7) normal or corrected hearing and vision; (8) basic reading comprehension and writing abilities in Spanish; and (9) signed written informed consent.
A total of 1165 Spanish speakers without cognitive impairment aged between 50 and 89 years were recruited. In the study, the sampling was non-probabilistic incidental, most of the participants lived in urban areas (93%) and did not receive any type of remuneration for their participation. All participants were asked to produce words that began with P. Letters M and R were also administered to a subsample of participants (852) aged 60 to 89 years.
Before unifying data from the full ‘P’ sample with the ‘M and R’ subsample, we verified that there were no statistically significant differences in performance that could be due to the group [F(1,1165) = 2.44, p = .118] or its interaction with sociodemographic variables: group × sex [F(1,1165) = 2.35, p = .125]; group × educational level [F(3,1163) = 1.97, p = .117]; group × age group [F(2,1164) = 1.33, p = .264]; group × sex × educational level [F(3,1163) = 2.19, p = .087]; group × sex × age group [F(2,1164) = 0.35, p = .703]; group × educational level × age group [F(6,1160) = 0.80, p = .565]; or group × sex × educational level × age group [F(5,1161) = 0.86, p = .503].
Table 1 shows the distribution by sex and educational level for the total sample and the subsample. Regarding Spanish people aged 50 or more, they completed their studies under the educational law enacted in 1970 or under the law that precedes it, established in 1953, that was reformed in 1967. In respect of university studies, there was a law in 1943 substituted later in 1970 for another that distinguished different levels (diplomado, licenciado, doctor, trad. English: diplomate, graduate, doctorate). To cope with the heterogeneity of levels or grades (EGB, elemental bachelor, superior bachelor, BUP, etc.) we used the following equivalents, related to international categories. We have considered educational level as an ordinal variable (with values ranging from 0 to 3). That is, ‘Without formal education’ (0): less than 6 years of schooling; ‘Primary studies’ (1): between 6 and 11 years of schooling; ‘Secondary studies’ (2): between 12 and 15 years of schooling; ‘Higher studies’ (3): more than 15 years of formal education. The level “Without formal education” corresponded to people who are literate but could only go to school for 2 or 3 years (they only have basic reading and writing skills and simple mathematical calculation). The mean age of the total sample was 72.08 years (SD = 6.46) and of the subsample was 74.01 years (SD = 3.83). The median of educational level was 2 for the total sample and the subsample. Sex was coded as 0 for females and 1 for males. The percentage of women was 65% in the total sample and 63% in the subsample.
Note. Educational level is an ordinal variable, with values ranging from 0 to 3. ‘Without formal education’ (0): less than 6 years of schooling; ‘Primary studies’ (1): between 6 and 11 years of schooling; ‘Secondary studies’ (2): between 12 and 15 years of schooling; ‘Higher studies’ (3): more than 15 years of formal education
The Spanish Consortium for Ageing Normative Data (SCAND) initiative takes the data from three Spanish cohort studies, Aging Brain Projects of the Complutense University of Madrid, the Vallecas Project, and the Compostela Aging Study. The SCAND initiative was developed with the aim of sharing data on the Spanish middle-aged and old adult population provided that the above-mentioned studies share a set of neuropsychological tests in their evaluation protocols. All participants in the total sample were selected given that they had been involved in different studies about the aging process and were recruited between 2008 and 2019.
All studies complied with the ethical standards of the Declaration of Helsinki and were approved by the local Ethics Committees of the participant institutions.
Instruments
In the present study, three letters were considered (P, M, and R), all of which were part of the comprehensive neuropsychological assessment protocol used by each of the SCAND research groups, which included screening tests, scales, and other tests belonging to different cognitive domains (memory, executive functions, and language). Initially, the PF task with the letter P was included in the protocol. Although this is common among a significant number of clinicians, we also found some cases in which the letters M or R were used in isolation. For this reason, we later began to use the three letters (P, M, and R) with participants. Participants were asked to generate as many words as possible beginning with these initial letters in 60 seconds. For each letter, the number of correct words was registered, excluding intrusions and perseverations. Errors (perseverations and intrusions) and words produced every 15 seconds were recorded only for the subsample.
Procedure
All participants completed a structured interview to collect sociodemographic data, screening tests, and an extensive neuropsychological assessment, including memory, executive functions, and language tests, administered by neuropsychologists well-trained in the use of neuropsychological assessment tools. All participants were informed about the main research aspects, and they signed a written informed consent before performing any study procedure.
According to standard instructions (e.g., Lezak et al., Reference Lezak, Howieson, Bigler and Tranel2012), participants were asked to generate in 60 seconds as many words as possible that began with each letter (presented always in this order and consecutively: P, M, and R), excluding proper names and repetitions of the same word with different endings. Participants started naming words beginning with each letter and then two neuropsychologists recorded correct words, perseverations, and intrusions in the order that they were generated. Two raters were used to ensure the reliability of the scoring procedure. The inter-rater reliability was near .99 in all cases.
Data analyses
The statistical procedure was as follows: First, the cumulative frequency distribution of the raw scores was generated. Percentile ranges were assigned to the raw scores depending on their place in distribution. Then the percentile ranges were converted to scaled scores (ss) (range 2 to 19) using the formula ss = 10 + 3*Z where Z is the normalized standard score corresponding to the percentile. This transformation of raw scores produced a normal distribution (M = 10 and SD = 3), which allows the application of linear regressions to test the effect of sociodemographic variables and to calculate the scaled-adjusted scores (ssadjusted).
Secondly, the effects of age, education, and sex were verified. For each letter, three univariate regressions were calculated on ss with age, education, and sex as predictors. Corrections were only applied for those sociodemographic variables that yielded a significant regression coefficient (p < .05) and that also explained more than 5% of the variance (Lee, Reference Lee2014). Finally, adjustments were made according to age, education, and sex on the SS, using the following formula:
ssadjusted = ss – [B 1 * (Age – Mean1) + B 2 * (Education – Median2) + B 3 * (Sex – Mode3)]
In the formula, and due to the level of measurement of each of the variables, the mean is subtracted from age (continuous ratio level), the median from education (ordinal level), and the mode from sex (nominal level), so that the adjusted scores provide a better-standardized reference. All analyses were performed with SPSS version 25.
Results
Table 1 showed the distribution of women and men by educational level and age group.
Correlations between the individual letter scores and each one with the P + M + R score were high and statistically significant (r P-M = .703; r P-R = .718; r M-R = .747; r P-PMR = .896; r M-PMR = .902; r R-PMR = .910; p ≤ .001 in all cases).
Norms for the total number of correct words produced by participants
Table 2 shows the descriptive statistics of the total number of correct words by sociodemographic variables. Age was categorized only to show its distribution in the tables but was included as continuous in all statistical analyses.
Note. Educational level is an ordinal variable, with values ranging from 0 to 3. ‘Without formal education’ (0): less than 6 years of schooling; ‘Primary studies’ (1): between 6 and 11 years of schooling; ‘Secondary studies’ (2): between 12 and 15 years of schooling; ‘Higher studies’ (3): more than 15 years of formal education.
Determination coefficients corresponding to education exceed the criterion in P, M, R, and P + M + R score, explaining 19.4% (r = .444; p < .001; R 2 = .194), 22.9% (r = .478; p < .001; R 2 = .229), 21.8% (r = .467; p < .001; R 2 = .218), and 26.6% (r = .516; p < .001; R 2 = .266), of the variance, respectively, however, age and sex did not reach the 5% criterion regarding the percentage of variance explained (R 2 < .05 in all cases).
Table 3 includes unadjusted scaled scores (ss) and percentile ranges corresponding to the total number of words evoked by participants.
Note: The adjustments by education in P, M, R, and P + M + R indicated that for people without formal education +2 points must be added to their ss score for P, M and R, and +3 points when P + M + R score is considered. For individuals with primary education, +1 must be added for P, M, R or P + M + R, and when the educational level is higher, the ssadjusted score for P, M, R, or P + M + R must be −2 points. No adjustments are needed for secondary education level.
The adjustments by education in P, M, R, and P + M + R indicated that for people without formal education +2 points must be added to their ss score for P, M and R, and +3 points when P + M + R score is considered. For individuals with primary education, +1 must be added for P, M, R or P + M + R, and when the educational level is higher, the ssadjusted score for P, M, R, or P + M + R must be –2 points. No adjustments are needed for the secondary education level. To explain how to use tables to select the correct scaled score given a raw score and how to use the correction on unadjusted scaled scores if needed, let us consider an example. If a patient without formal education produced 16 words with the letter P, we first locate the raw score in Table 3, then we see the percentile range at the left (69-75) and the corresponding unadjusted scaled score (ss = 12). To adjust the unadjusted scaled score according to her/his educational level “without formal education” it is necessary to correct his/her unadjusted scaled score by adding two points (ssadjusted = 14). For a patient with the same raw score but with a higher educational level, their unadjusted scaled score should be corrected by subtracting two points (ssadjusted = 10).
Norms for errors (perseverations and intrusions)
Table 4 summarizes descriptive statistics of perseverations and intrusions by sociodemographic variables. Data for errors were obtained only from the subsample of 852 participants.
Note. Educational level is an ordinal variable, with values ranging from 0 to 3. ‘Without formal education’ (0): less than 6 years of schooling; ‘Primary studies’ (1): between 6 and 11 years of schooling; ‘Secondary studies’ (2): between 12 and 15 years of schooling; ‘Higher studies’ (3): more than 15 years of formal education.
Scaled scores and percentile ranges corresponding to perseveration and intrusion errors across letters are shown in Table 5. None of the sociodemographic variables could explain at least 5% of the variance, so no adjustments are required (R 2 < .05 in all cases).
Note that higher scores indicate more perseverations and intrusions and thus worse performance. Pc = Percentile ranges; Rs =Raw scores
Norms for the total number of words produced every 15 seconds
Descriptive statistics corresponding to the total number of words produced every 15 seconds are shown in Table 6.
Note. Educational level is an ordinal variable, with values ranging from 0 to 3. ‘Without formal education’ (0): less than 6 years of schooling; ‘Primary studies’ (1): between 6 and 11 years of schooling; ‘Secondary studies’ (2): between 12 and 15 years of schooling; ‘Higher studies’ (3): more than 15 years of formal education
Table 7 shows unadjusted scaled scores (ss) and percentile ranges corresponding to this measure across letters.
The only variable that explained at least 5% of the variance was education (From 0 to 15 sec.: r = .299, p = .001, R 2 = .089 for P; r = .379, p = .001, R 2 = .143 for M; r = .347, p = .001, R 2 = .121 for R; and r = .417, p < .001, R 2 = .174 for P + M + R. From 16 to 30 sec.: r = .338, p = .001, R 2 = .114 for P; r = .348, p = .001, R 2 = .121 for M; r = .373, p = .001, R 2 = .139 for R; and r = .456, p < .001, R 2 = .207 for P + M + R. From 31 to 45 sec.: r = .345, p = .001, R 2 = .119 for P; r = .340, p = .001, R 2 = .116 for M; r = .323, p = .001, R 2 = .104 for R; and r = .420, p < .001, R 2 = .177 for P + M + R. From 46 to 60 sec.: r = .281, p = .001, R 2 = .079 for P; r = .306, p = .001, R 2 = .094 for M; r = .321, p = .001, R 2 = .104 for R; and r = .424, p < .001, R 2 = .180 for P + M + R). Adjustments were only needed for education so that when considering the letter P, ssadjusted is ss + 1 for those without formal education and ss−1 for those with higher education, in all 15-second segments. When the letter is M, in the first segment (0 to 15 sec.) ssadjusted is ss + 2 for patients without formal education, ss + 1 for those with primary education, and SS−2 for those with higher education, and for the rest of the 15-second segments ssadjusted is ss + 1 for those without formal education and ss-1 for those with higher education. Regarding the letter R, in the second segment (16 to 30 sec.) ssadjusted is ss +2 for patients without formal education, ss + 1 for those with primary education, and ss−2 for those with higher education, and for the rest of 15-second segments ssadjusted is ss + 1 for those without formal education and ss−1 for those with higher education. When the sum P + M + R is considered, ssadjusted is ss + 2 for patients without formal education, ss + 1 for those with primary education, and ss−1 for those with higher education, regardless of segment.
Discussion
The present study provides normative data of phonological VF (letters P, M, R) for Spanish middle- and older-aged adults, considering sociodemographic factors, and different measures such as the total numbers of words, errors (perseveration and intrusions), and 15 second-segmented scores.
Regarding the whole sample it is important to note that the percentage of women with a lower educational level was higher than that of men, a fact that reflects the reality of the Spanish population of people over 50 years of age (Pérez et al., Reference Pérez, Abellán, Aceituno and Ramiro2020). Therefore, the sociodemographic distribution of the participants can be considered representative of the Spanish population, with the sole exception that our group of men has a higher level of education than usual.
All correlations between individual letters and between those and the total P + M + R score are high and similar, which would indicate that a clinician could use a single letter if she/he needs brevity in her/his evaluation protocol instead of P + M + R total score.
Results regarding the total number of words produced by participants point out that education is the most important variable since it explains between 19 and 23% of the total variance in all letters, and 26.6% in the sum P + M + R. The effect of age was also statistically significant, although it was unable to explain 3% of the variance. Sex does not have any significant effect, in accordance with previous studies (Mathuranath et al., Reference Mathuranath, George, Cherian, Alexander, Sarma and Sarma2003).
Results showing the effect of the education level on the total number of words produced are in accordance with all the studies reviewed. Tombaugh et al. (Reference Tombaugh, Kozak and Rees1999) showed a direct influence of education on PF, accounting for 18.6% of the variance. Mathuranath et al. (Reference Mathuranath, George, Cherian, Alexander, Sarma and Sarma2003) concluded with similar results, indicating a significant influence of education on PF, in which participants with higher educational levels present better performance than those with fewer years of schooling. Aziz et al. (Reference Aziz, Khater, Emara, Tawfik, Rasheedy, Mohammedin, Tolba, El-Gabry and Qassem2017) showed that educational level significantly influences both phonemic and semantic fluency tasks, with higher educational levels being associated with better performance. This result is a constant across studies and it appears even with qualitative characteristics of task performance such as clustering and switching (Pereira et al., Reference Pereira, Goncalves, Holz, Goncalves, Kochhann, Joanette, Zimmermann and Fonseca2018)
Another set of studies have shown the significant effects of educational level and age. Dursun et al. (Reference Dursun, Robertson, Bird, Kutcher and Kutcher2002) studied the effects of age and total years of education on vocabulary performance in healthy volunteers. They found that education and age were overall predictors of total scores, but no correlation was found with sex. Peña-Casanova et al. (2009) reported effects of age and education in different letters, but sex was again not significant. A recent study conducted by Marquine et al. (Reference Marquine, Morlett Paredes, Madriaga, Blumstein, Umlauf, Kamalyan and Cherner2021) has found a small effect of age and a medium effect of educational level on PF scores, thus showing a similar pattern to that of the present study.
Word production is largely based on verbal skills (vocabulary), memory retrieval and recall, and executive control processes (self-initiation and monitoring to inhibit repetitions and intrusions) (Shao et al., Reference Shao, Janse, Visser and Meyer2014). VF tasks are multidimensional as they rely on other executive functioning skills as well, such as processing speed, cognitive flexibility, working memory, and sustained attention (Diamond, Reference Diamond2013). PF requires search and retrieval strategies dependent on accessing the mental lexicon. A higher educational level entails a larger lexicon and greater verbal lexical retrieval capacity, as well as the use of more efficient information retrieval strategies (Federmeier et al., Reference Federmeier, Kutas and Schul2010). Thus, many factors can influence educational attainment. Formal education may increase vocabulary knowledge, a strong predictor of PF performance with age (Henry & Phillips, Reference Henry and Phillips2006), and provides contents and procedures frequently included in cognitive testing (Ardila et al., Reference Ardila, Ostrosky-Solis and Mendoza2000). Cognitively stimulating experiences in early life can enhance brain development and impact cognitive ability later in life (Noble et al., Reference Noble, Houston, Brito, Bartsch, Kan, Kuperman, Akshoomoff, Amaral, Bloss, Libiger, Schork, Murray, Casey, Chang, Ernst, Frazier, Gruen, Kennedy, Van Zijl and Sowell2015). In line with these arguments, some studies have reported that reading level (a proxy related to cognitive reserve) and PF were moderately correlated (Johnson-Selfridge & Zalewski, Reference Johnson-Selfridge and Zalewski2001). The systematic review conducted by Panico et al. (Reference Panico, Sagliano, Magliacano, Santangelo and Trojano2022) described two studies that found correlations between cognitive reserve (CR) and PF. Moraes et al. (Reference Moraes, Guimarães, Joanette, Parente, Fonseca and de Almeida2013) investigated the correlation between CR (expressed by years of formal education and frequency of reading and writing) and scores on VF tasks (phonemic and semantic fluency), among other tests/tasks. Education showed the best predictive value on PF. Roldan-Tapia et al. (Reference Roldán-Tapia, García, Cánovas and León2012) reported that a composite index of CR (including education, occupation, and vocabulary knowledge) significantly correlated with scores on PF. In another interesting study, Kraan et al. (Reference Kraan, Stolwyk and Testa2013) concluded that PF in adults was associated with verbal intellectual function and processing speed.
While traditional normative data studies have examined the total number of words generated as a measure of VF performance, there is evidence suggesting that task performance analysis (errors or temporal analysis) provides valuable additional information (Abwender et al., Reference Abwender, Swan, Bowerman and Connolly2001; Pakhomov et al., Reference Pakhomov, Eberly and Knopman2018). As far as we know, the present study is the first attempt to provide normative data on older Spanish adults concerning the number of perseveration and intrusion errors. Our results showed a small statistically significant effect of educational level that failed to explain at least 5% of the variance of the errors. Effects of age or sex were not significant. Previous studies considering the effects of education on older and middle-aged adults’ errors were not found. As a tentative hypothesis, the fact that education explains a percentage under 5% criterion might be related to the small range observed in the number of errors. Ranges in raw scores are certainly small, given that the sample is composed of healthy older adults.
Concerning our measures of word generation performance on 15-second time intervals, we also did not find previous studies including Spanish normative data in similar measures. Our results indicate that the highest word production occurs in the first 30 seconds of the tests, and a descending curve of word production was observed over the 60-second test period. These results are consistent with those found in the adult population; it has been suggested that production of words is maximal during the initial stages of the task (semi-automatic retrieval) as individuals access their long-term memory, which consists of the greatest frequency, easy-to-retrieve words. When this store is exhausted, the individual attempts to retrieve words from a larger pool of words, making the search process more time-consuming and more difficult (Crowe, Reference Crowe1998; Jacobs et al., Reference Jacobs, Mercuri and Holtzer2021; Raboutet et al., Reference Raboutet, Sauzéon, Corsini, Rodrigues, Langevin and N'kaoua2010). That is, as time on task increased the production decreased, as did the word frequency of the items produced. In a study conducted by Demetriou and Holtzer (Reference Demetriou and Holtzer2017), healthy older adults were fast and efficient at initiating search processes and retrieving words from memory as evidenced by the larger number of words they produced in the first 20 seconds of the task. However, there was a discrepancy between the first-time intervals and the subsequent two intervals’ performance which Demetrious and Holtzer explained assuming that people had to monitor and inhibit responses that had already been given from a large set of evoked words.
Educational level effects appeared in all the 15-sec.-intervals across letters (and in the sum) in the present study. Sex and age had a significant effect on some of the 15-second interval measures but were very weak as they failed to explain 2% of the variance. Similarly, Venegas and Mansur (Reference Venegas and Mansur2011) found that participants’ educational level has a positive effect on the three first quartiles in PF, while age was not significant. As suggested by these authors, the first quartile is dedicated to semi-automatic retrieval, while the other quartiles are implicated in planning, adjusting, and monitoring the performance, to guarantee the generation of items and avoidance of repetitions and intrusions. The effect of educational level on word production in the last 15-second interval is also relevant given that at this point the task requires the greater effortful retrieval processes and people with higher educational attainment should show an advantage given their larger lexicon, greater verbal lexical retrieval capacity, and more efficient information retrieval strategies (Demetriou & Holtzer, Reference Demetriou and Holtzer2017). Congruent with this line of reasoning is the study of Sauzéon et al. (Reference Sauzéon, Raboutet, Rodrigues, Langevin, Schelstraete, Feyereisen and N’Kaoua2011) which revealed a knowledge compensation mechanism in older adults’ letter fluency productions that only occurred during the second period (31–60 sec.) and was related to vocabulary level.
We would like to point out as a limitation of this research that we did not use epidemiological recruitment methods, and potentially medical and/or psychological conditions that may interfere with cognition and self-reported mood. Neither of these variables was assessed in this study. We did not recruit illiterate participants because they are very unusual in Spain. Although it provides a greater representation of the Spanish population to the sample of participants, the unequal sex distribution of the sample should be added to the list of the study’s limitations. Also, the reader must consider that these normative data are not generalizable to Spanish speakers outside of Spain, provided that other sample demographic characteristics must be similar to those of the normative sample; additionally, the administration and scoring procedures of the test used must be matched too (Mitrushina et al., Reference Mitrushina, Boone, Razani and D'Elia2005).
Conclusions and future directions
The present study provides normative data for healthy older people on the PF task for the letters P, M, and R, and considers errors and production by time segments. The influence of education is in line with other previous studies. These data may also be of considerable use for comparisons with other normative studies in Spain and other countries.
There are very few normative studies for VF tasks for people over 60–75 years of age, and even fewer for Spanish speakers, in addition, to our knowledge, this is the first study to present normative data regarding the number of errors (perseverations and intrusions) and the number of words produced in 15-sec. intervals. Normative data for PF in specific populations are a useful resource for clinical and research studies and may aid in the early detection of cognitive impairment, diagnosis, establishing prognosis, planning treatment, and monitoring clinically significant changes.
It is of interest to further investigate whether a different approach to quantifying performance on PF, such as error testing or time segment generation, is related to prevalent MCI or predictive of its incidence.
Acknowledgements
We especially thank all the participants for their totally generous collaboration, without which this study would not have been possible.
Competing Interest
None.
Funding
This work was supported by the Ministry of Science, Innovation and Universities of Spain (PSI2009-14415-C03-01, PSI2012-38375-C03-01, PSI2010-22224-C03-01, PSI2015-68793-C3-1-R and RTI2018-098762-B-C31) and the Vallecas Project supported by Reina Sofía Foundation, and PILEP+90 sponsored by Fundación General de la Universidad de Salamanca (FGUSAL)-CENIE within the Interreg V-A Program, Spain-Portugal, (POCTEP), 2014-2020.