Introduction
Fibromyalgia (FM) is a chronic musculoskeletal disorder that affects at least 5 million US adults (Lawrence et al., Reference Lawrence, Felson, Helmick, Arnold, Choi, Deyo, Gabriel, Hirsch, Hochberg, Hunder, Jordan, Katz, Kremers and Wolfe2008). In addition to widespread pain, fatigue, and unrefreshed sleep, cognitive difficulties are a common and impairing symptom of FM. In this area of research, a distinction is made between subjective cognitive difficulties (i.e., “fibrofog”), assessed via self-report, and objective cognitive difficulties, measured with neuropsychological tests. The evidence for subjective cognitive difficulties in FM is clear: approximately 70% of individuals endorse fibrofog (Katz et al., Reference Katz, Heard, Mills and Leavitt2004), including issues with memory/learning, attention/concentration, processing speed, and executive functioning, and individuals rate fibrofog among their most troubling symptoms (Arnold et al., Reference Arnold, Crofford, Mease, Burgess, Palmer, Abetz and Martin2008; Bennett et al., Reference Bennett, Jones, Turk, Russell and Matallana2007). In contrast, data regarding objective cognitive difficulties in FM are inconsistent. Some studies have corroborated self-reports of cognitive difficulties using neuropsychological tests, finding diminished performance across multiple cognitive domains in FM relative to non-FM controls, including processing speed, attention, learning and memory, working memory, and executive functions (Bell et al., Reference Bell, Trost, Buelow, Clay, Younger, Moore and Crowe2018; Wu et al., Reference Wu, Huang, Fang, Ko and Tsai2018). Other studies have shown discrepancies between subjective and objective cognitive functioning in FM (Suhr, Reference Suhr2003; Walitt et al., Reference Walitt, Čeko, Khatiwada, Gracely, Rayhan, VanMeter and Gracely2016) and limited/focal or no decrements in cognitive test performance (Grace et al., Reference Grace, Nielson, Hopkins and Berg1999; Kim et al., Reference Kim, Kim, Kim, Nam, Han and Lee2012; Landrø et al. Reference Landrø, Stiles and Sletvold1997; Lee et al., Reference Lee, Pendleton, Tajar, O’Neill, O’Connor, Bartfai, Boonen, Casanueva, Finn, Forti, Giwercman, Han, Huhtaniemi, Kula, Lean, Punab, Silman, Vanderschueren, Moseley, Wu and McBeth2010; Miró et al., Reference Miró, Lupiáñez, Hita, Martínez, Sánchez and Buela-Casal2011; Park et al. Reference Park, Glass, Minear and Crofford2001; Suhr, Reference Suhr2003; Walitt et al., Reference Walitt, Fitzcharles, Hassett, Katz, Häuser and Wolfe2011; Walitt et al. Reference Walitt, Roebuck-Spencer, Bleiberg, Foster and Weinstein2008; Walteros et al., Reference Walteros, Sánchez-Navarro, Muñoz, Martínez-Selva, Chialvo and Montoya2011).
A significant limitation of existing research on objective cognitive difficulties in FM that might explain divergent findings is the use of lab-/clinic-based neuropsychological tests administered at a single time point. Studies employing this approach lack ecological validity in that the lab/clinic setting does not resemble the real-world environment in which individuals perform cognitively demanding tasks (Sbordone, Reference Sbordone, Sbordone and Long1996; Spooner & Pachana, Reference Spooner and Pachana2006). This issue is particularly salient when studying FM, as individuals with this condition exhibit higher susceptibility to distraction (Bell et al., Reference Bell, Trost, Buelow, Clay, Younger, Moore and Crowe2018; Teodoro et al., Reference Teodoro, Edwards and Isaacs2018) and hypersensitivity to sensory stimuli (Carrillo-de-la-Peña et al., Reference Carrillo-de-la-Peña, Vallet, Pérez and Gómez-Perretta2006; Harte et al., Reference Harte, Ichesco, Hampson, Peltier, Schmidt-Wilcke, Clauw and Harris2016; Hollins et al., Reference Hollins, Harper, Gallagher, Owings, Lim, Miller, Siddiqi and Maixner2009; Kosek et al., Reference Kosek, Ekholm and Hansson1996; Lombion et al., Reference Lombion, Comte, Tatu, Brand, Moulin and Millot2009; Lorenz et al., Reference Lorenz, Grasedyck and Bromm1996; et al., Reference Lötsch, Kraetsch, Wendler and Hummel2012; McDermid et al., Reference McDermid, Rollman and McCain1996; Petzke et al., Reference Petzke, Clauw, Ambrose, Khine and Gracely2003) – factors that may contribute to objective cognitive difficulties in daily life but that are mitigated in a controlled testing environment. Further, subtle cognitive changes in FM may elude traditional neuropsychological testing. Administering these tests at a single time point may fail to capture intraindividual variations in cognitive performance that indicate poor cognitive functioning (Ram et al., Reference Ram, Rabbitt, Stollery and Nesselroade2005; West et al., Reference West, Murphy, Armilio, Craik and Stuss2002) and risk for future cognitive decline (Bielak et al., Reference Bielak, Hultsch, Strauss, MacDonald and Hunter2010; Bielak et al., Reference Bielak, Hultsch, Strauss, Macdonald and Hunter2010). Performance on a single occasion is also susceptible to situational factors, such as a poor night’s sleep and psychosocial and environmental stressors, whereas repeated testing over time captures an individual’s average performance.
Ambulatory cognitive testing offers an alternative to standard neuropsychological testing in which objective cognitive functioning can be assessed in the lived environment and on repeated occasions, thereby increasing ecological validity and measurement reliability. Sliwinski et al. (Reference Sliwinski, Mogle, Hyun, Munoz, Smyth and Lipton2018) found that smartphone-based ambulatory tests of processing speed (i.e., a symbol search test) and working memory (i.e., a dot memory test) demonstrated high reliability and validity in a nonclinical sample of community adults. Whether these tests would exhibit similar psychometric properties in FM is unknown. Ambulatory cognitive tests that are feasible, reliable, and valid for administration in this population could be used in future studies of the nature, severity, and impact of cognitive difficulties, as well as to examine effects of interventions on cognitive functioning in daily life.
The goal of this observational study was to determine the feasibility, reliability, and validity of repeated ambulatory tests of processing speed and working memory in adults with FM. To determine psychometric considerations specific to FM, analyses were performed separately for individuals with FM and non-FM controls. First, we evaluated participant compliance to the testing protocol – a key determinant of feasibility – by examining average daily and within-day response rates. Second, we aggregated across test sessions to determine average between-person reliabilities, the number of testing sessions necessary to obtain high reliability, and the stability of reliabilities over time. Finally, we evaluated construct validity by examining the convergence of ambulatory cognitive tests with validated lab-based neuropsychological tests and by contrasting test scores in the FM and non-FM groups.
Method
Participants
Eligible participants were ≥ 18 years old and conversationally fluent and able to read in English at a minimum 6th grade level. Exclusion criteria were: (1) self-reported comorbid neurologic disorder, learning disorder, or cognitive impairment; (2) current or ≥ five-year history of alcohol/recreational drug dependence; (3) hearing or vision impairment that would preclude cognitive testing; (4) diagnosis of untreated obstructive sleep apnea; (5) atypical sleep/wake pattern (e.g., night-shift work). Fulfilling 2016 American College of Rheumatology diagnostic criteria for FM (Wolfe et al., Reference Wolfe, Clauw, Fitzcharles, Goldenberg, Häuser, Katz, Mease, Russell, Russell and Walitt2016) was inclusionary for participants in the FM group and exclusionary for participants in the control group. Participants in the control group were age-, sex-, and education-matched to participants in the FM group.
Procedures
Study procedures were approved by the Medical Institutional Review Board of the University of Michigan. The research was completed in accordance with the Helsinki Declaration. Volunteers were recruited from January 2018 through August 2018 via patient registries, community groups, fliers in health centers and community settings, and advertisements on a university-based recruitment website. Volunteers were phone-screened for eligibility. Eligible participants provided written informed consent.
Study participation involved an in-person baseline visit followed by home monitoring consisting of ambulatory cognitive testing embedded within an EMA protocol. The 90-minute baseline visit included completing a battery of self-report measures and standardized cognitive tests and receiving training in home monitoring procedures (e.g., data collection device usage). Home monitoring commenced following the baseline visit and lasted eight days, in line with typical ecological momentary assessment protocols (de Vries et al., Reference de Vries, Baselmans and Bartels2021). Ambulatory cognitive tests were administered using a study-specific application installed on a ZTE Axon 7 mini smartphone (5.2” display; 1080 × 1920 pixels). Participants underwent a training session in which they were shown how to use the smartphone app and practiced until they were able to demonstrate mastery of the app. Participants were instructed to keep the smartphone with them at all times except when it was important they not be disturbed by the audible alert. The time of baseline visit completion determined the number of test sessions administered on day 1. On all subsequent days, participants completed five daily cognitive test sessions, consistent with typical EMA protocols (de Vries et al., Reference de Vries, Baselmans and Bartels2021) and prior study using these tests (Sliwinski et al., Reference Sliwinski, Mogle, Hyun, Munoz, Smyth and Lipton2018). The first of the daily cognitive test sessions was initiated by the participant upon waking. The four test sessions that followed were prompted via audible smartphone alerts that were programmed on a quasi-random schedule based on each individual’s typical waking time. Between-prompt intervals ranged 3 to 4.5 hours. After the home monitoring period, participants returned the smartphones to the laboratory via postage-paid return boxes. They were compensated up to $175 for their participation.
Measures
Baseline sociodemographic characteristics. Age, gender, race, ethnicity, years of education, and employment status were self-reported.
Baseline cognitive tests. At baseline, four cognitive tests were administered using the NIH Toolbox (Gershon et al., Reference Gershon, Cella, Fox, Havlik, Hendrie and Wagster2010) iPad application (Brearly et al., Reference Brearly, Rowland, Martindale, Shura, Curry and Taber2019). On each test, performance was indicated by either T-scores (mean ± SD = 50 ± 10) corrected for age, sex, race, ethnicity, and education, or uncorrected standard scores (mean ± SD = 100 ± 15). Higher scores reflected better performance.
(1) The pattern comparison test assessed processing speed. Participants were asked to identify whether pairs of visual stimuli were identical. They were allotted 85 seconds to complete as many trials as possible. (2) The list sorting test assessed working memory. Participants were presented with series of visual and auditory stimuli (e.g., animals) which they were asked to recall in sequence based on a specific dimension (e.g., size). (3) The flanker test assessed attention and inhibitory control. Participants were asked to focus their attention on a target stimulus (i.e., an arrow) while distractor stimuli (i.e., arrows faced toward or away from the target stimulus) flanked the target. On each trial, participants were asked to indicate the direction of the target stimulus. (4) The dimensional change card sort test assessed cognitive flexibility and attention. On each trial, participants were presented with a visual stimulus which they were asked to match to a target stimulus according to alternating criteria (i.e., shape or color) as indicated by a cue word on the screen.
Ambulatory cognitive tests. Two brief cognitive tests were administered five times daily using a study-specific smartphone application (Sliwinski et al., Reference Sliwinski, Mogle, Hyun, Munoz, Smyth and Lipton2018).
-
(1) The symbol search test assessed processing speed. Each test session consisted of sixteen trials. In each trial, participants were asked to indicate by touch which of two symbol pairs at the bottom of the screen exactly matched one of four symbol pairs at the top of the screen. Seventy-five percent of trials included a lure stimulus in which one of the two symbols within a pair at the bottom of the screen matched a symbol at the top of the screen, but the pair itself did not match. The trial ended when the participant made a selection. Variables of interest included the mean, median, and standard deviation of reaction times (milliseconds) for each session. As poor task effort produces invalid scores on neuropsychological tests, sessions with <70% accuracy were excluded from analyses. This cut point is consistent with the procedures used when validating these measures in nonclinical community adults and permits excluding scores likely produced by rote responding (i.e., indiscriminate selection), wherein around 50% accuracy is expected, or intentional poor performance (“faking bad”), wherein less than 50% accuracy is expected.
-
(2) The dot memory test assessed working memory. During each test session, participants completed four trials, each consisting of encoding, distraction, and retrieval phases. During the encoding phase, participants were allotted three seconds to memorize the location of three red dots appearing on a 5 × 5 square grid. The grid then disappeared, and the four-second distraction phase commenced, during which the participants were instructed to touch all F’s in an array of E’s. Finally, during the retrieval phase, participants were shown an empty 5 × 5 square grid and were asked to touch the squares corresponding with the locations of the three red dots as presented during the initial encoding phase. The trial ended when the participant pressed “Done.” The Euclidian distance – the collective distance of the three dots from their correct locations – was calculated for each trial. Variables of interest included the mean, maximum, and standard deviation of the Euclidian distances for each session.
Data analyses
Analyses were performed using IBM SPSS Statistics (V26). Descriptive statistics were generated for sociodemographic characteristics and study variables. To evaluate feasibility, average daily and within-day response rates were calculated separately for participants in the FM and non-FM groups for the full home monitoring period. Average daily response rates were calculated for each day (i.e., days 1, 2, 3, 4, 5, 6, 7, and 8) by dividing the total number of test sessions completed by the total number of possible test sessions during the respective day. Average within-day response rates were calculated for each time point (i.e., 1, 2, 3, 4, and 5) by dividing the total number of test sessions completed by the total number of possible test sessions during the respective time point. To evaluate reliability, unconditional means multilevel models were generated for each ambulatory cognitive test in each group. The between-person and within-person variances were used to calculate intraclass correlations (ICC), wherein ICC = between-person variance/(between-person variance + within-person variance). Between-person reliabilities for the full home monitoring period were calculated as between-person reliability = between-person variance/(between-person variance + within-person variance/n), wherein n referred to the average number of completed sessions. Additional unconditional means multilevel models were run using test scores from assessment periods of increasing duration (e.g., day 1 alone, days 1–2, days 1–3, etc.) to determine the number of days of assessment necessary to obtain between-person reliabilities of >.80 and >.90 for each ambulatory cognitive test in each group. Separate models were also run for each day (n = 8) to determine the stability of between-person reliabilities. In preparation for validity analyses, person-averaged variables for symbol search performance (mean, median, and SD of reaction times) and dot memory performance (mean, maximum, and SD of the Euclidian distances) were created by averaging each participant’s respective test session scores across the home monitoring period. To evaluate known-groups validity, the person-averaged ambulatory cognitive test scores and NIH Toolbox T-test scores were compared between the FM and non-FM groups using independent samples t-tests. Known-groups validity would be supported by significant group differences in performance on the ambulatory tests that parallel group differences in performance on the NIH Toolbox tests. To evaluate convergent validity, Pearson’s correlations were examined between the person-averaged ambulatory cognitive test scores and the NIH Toolbox standard scores in the FM and non-FM groups. Higher correlations between the ambulatory symbol search test and NIH Toolbox pattern comparison test of processing speed, as well as between the ambulatory dot memory test and NIH Toolbox list sorting test of working memory, would evidence higher convergent validity. Both NIH Toolbox T-scores and uncorrected standard scores were used for describing test performance in the FM and non-FM groups and for evaluating known-groups validity. Only NIH Toolbox uncorrected standard scores were used in correlational analyses assessing convergent validity, to optimize comparison with the ambulatory test scores, for which demographically adjusted test scores are not available.
Results
Sample characteristics. One-hundred individuals participated, including 50 with FM and 50 non-FM controls. Participants were 45.1 years old on average (range = 18–73 years), were predominantly female (88%), and had an average of 15.7 years of education. They were White (81%), Black/African American (13%), Asian (3%), or multiracial (3%), and 94% were not Hispanic or Latino/a. Participants with FM reported a higher rate of unemployment (FM = 40%, non-FM = 22%; X 2 (1) = 5.88, p = .02).
Feasibility of ambulatory cognitive tests. The FM group provided data for 89.5% of ambulatory cognitive test sessions and the non-FM group provided data for 90.0% of sessions. Average daily response rates ranged 84.2% to 94.4% for the FM group and 85.6% to 94.2% for the non-FM group (Figure 1). Average within-day response rates ranged 85.8% to 94.9% for the FM group and 87.2% to 94.2% for the non-FM group (Figure 2).
Reliability of symbol search test. Between-person differences accounted for 65% of the total variance in symbol search test performance in the FM group and 61% in the non-FM group (Table 1). Overall average between-person reliabilities were .98 for both groups. For the FM group, between-person reliability exceeded .80 after one day of assessment (reliability = .88) and .90 after two days (reliability = .94). For the non-FM group, between-person reliability exceeded both thresholds after one day of assessment (reliability = .91). The reliabilities of daily average scores were stable across the full assessment period, ranging .88 to .94 for the FM group and .90 to .93 for the non-FM group (Figure 3).
Note. FM = fibromyalgia.
Reliability of dot memory test. Between-person differences accounted for 53% of the total variance in symbol search test performance in the FM group and 44% in the non-FM group (Table 1). Overall average between-person reliabilities were .97 for the FM group and .96 for the non-FM group. For the FM group, between-person reliability exceeded .80 after two days of assessment (reliability = .85) and .90 after three days (reliability = .92). For the non-FM group, between-person reliability exceeded .80 after two days of assessment (reliability = .80) and .90 after four days (reliability = .92). The reliabilities of daily average scores were low on day 1 (reliability in FM group = .41; reliability in non-FM group = .18) but were markedly higher and stable for days 2 through 8, ranging .83 to .90 for the FM group and .76 to .90 for the non-FM group (Figure 3).
Validity of symbol search test. The FM and non-FM groups did not significantly differ in performance on the symbol search test, neither as measured by aggregate mean reaction time (t(98) = 1.32, p = .19), median reaction time (t(98) = 1.32, p = .19), nor SD of reaction time (t(98) = 1.65, p = .10) (Table 2). In contrast, on the NIH Toolbox pattern comparison test of processing speed, the FM group exhibited significantly worse T-scores than did the non-FM control group (t(98) = −2.48, p = .02), although group differences in uncorrected standard scores were not statistically significant (p = .08). Within the FM group, worse symbol search test performance was correlated with worse performance on all four NIH Toolbox tests, including pattern comparison (r = −.66, p < .001), list sorting (r = −.45, p = .001), flanker (r = −.51, p < .001), and dimensional change card sort (r = −.51, p < .001) (Table 3). Likewise, within the non-FM group, worse symbol search test performance was correlated with worse performance on all four NIH Toolbox tests, including pattern comparison (r = −.56, p < .001), list sorting (r = −.43, p = .002), flanker (r = −.43, p = .002), and dimensional change card sort (r = −.33, p = .02) (Table 4).
Note. FM = fibromyalgia. Values are mean (SD) unless indicated otherwise.
Note. FM = fibromyalgia. aperson-averaged. *p < .05; **p < .01.
Note. FM = fibromyalgia. aperson-averaged. *p < .05; **p < .01.
Validity of dot memory test. The FM group performed significantly worse than the non-FM control group on the dot memory test, as measured by aggregate mean error score (t(98) = 3.31, p = .001), maximum error score (t(98) = 3.34, p = .001), and SD of error score (t(98) = 3.12, p = .002) (Table 2). Similarly, on the NIH Toolbox list sorting test of working memory, the FM group exhibited significantly worse T-scores than did the non-FM control group (t(98) = −2.01, p < .05), although group differences in uncorrected standard scores were not statistically significant (p = .09). Within the FM group, worse dot memory test performance was correlated with worse performance on all four NIH Toolbox tests: pattern comparison (r = −.37, p = .009), list sorting (r = −.31, p = .03), flanker (r = −.39, p = .005), and dimensional change card sort (r = −.37, p = .008) (Table 3). Within the non-FM group, worse dot memory test performance was correlated only with worse performance on the NIH Toolbox list sorting test (r = −.35, p = .01) (Table 4).
Discussion
This study is the first to test the psychometric properties of ambulatory cognitive tests as administered in people with FM. The findings demonstrate the feasibility of repeated assessment of processing speed and working memory in the lived environment using smartphone-based symbol search and dot memory tests, with FM and non-FM participants both completing 90% of test sessions over the course of an eight-day assessment period. The high degree of within-person variability in performance on the ambulatory cognitive tests, evidenced by the ICC values, supports the notion that repeated testing may be necessary to adequately assess cognitive functioning. As indicated by overall average between-person reliabilities of ≥.96, the ambulatory tests produced highly reliable scores in both groups, and levels >.90 were attained in as few as two days of repeat testing in the FM group. Intercorrelations among scores on the ambulatory cognitive tests and in-lab neuropsychological tests, and FM vs. non-FM group differences in average scores, provided mixed evidence for construct validity.
Participant compliance to the testing protocol was examined as an indicator of feasibility. Participants were asked to complete two smartphone-based cognitive tests five times per day – upon waking and at quasi-random intervals thereafter – for eight days. Average daily response rates in both FM and non-FM groups were high across the assessment period, ranging 84% to 94%, and were robust to differences in time of day, with average within-day response rates ranging 86% to 95%. Similarly high response rates have been found in prior studies that have employed ambulatory cognitive testing in adult lifespan and older adult samples (Cerino et al., Reference Cerino, Katz, Wang, Qin, Gao, Hyun, Hakun, Roque, Derby, Lipton and Sliwinski2021; Sliwinski et al., Reference Sliwinski, Mogle, Hyun, Munoz, Smyth and Lipton2018; Yang et al., Reference Yang, Hakun, Roque, Sliwinski and Conroy2021), suggesting that this method is feasible across diverse groups, including, as demonstrated here, those with chronic illness and high symptom burden. It is notable that this level of compliance was observed in the context of an EMA protocol in which the participants additionally provided ratings of FM symptoms at the time they completed the cognitive tests (Kratz et al., Reference Kratz, Whibley, Kim, Sliwinski, Clauw and Williams2020; Kratz et al., Reference Kratz, Whibley, Kim, Williams, Clauw and Sliwinski2020; Whibley et al., Reference Whibley, Williams, Clauw, Sliwinski and Kratz2022). Thus, these data support the feasibility of administering brief, repeated tests of cognitive performance in FM and show that ambulatory cognitive testing can be embedded in a standard EMA protocol that participants will adhere to with relatively high compliance rates.
Overall average between-person reliabilities for both the ambulatory symbol search and dot memory tests were very high across the FM and non-FM groups and were similar to overall average reliabilities obtained by Sliwinski et al. (Reference Sliwinski, Mogle, Hyun, Munoz, Smyth and Lipton2018) using a longer 14-day protocol in a nonclinical community adult sample. Importantly, very few days of repeat testing were needed to attain high reliability in this study; in the FM group, values exceeded .90 after just two days of testing with the symbol search test and after three days of testing with dot memory. That reliable scores can be swiftly attained with such few assessments demonstrates promise for use of ambulatory cognitive testing as an alternative to repeat in-lab/clinic neuropsychological testing, potentially mitigating several logistical challenges associated with the latter (e.g., demands on lab/clinic time, space, and personnel). Selection of the duration (eight days) and density (five times daily) of the data collection protocol was motivated by the primary study questions, which aimed to examine momentary associations between self-report phenomena (symptoms, perceived cognitive functioning) and cognitive test performance. Determinations about the duration and density of ambulatory cognitive performance data collection should be made by identifying the protocol that presents the lowest possible participant burden while also providing data that address the primary research/clinical question(s). The reliability data from this study suggest that a briefer and potentially less dense protocol than that used here may be sufficient to provide a reliable assay of cognitive performance in future FM studies. For example, shorter data collection periods than that used here may be appropriate to detect between-group differences in change in cognitive functioning with FM treatment; however, longer and/or denser protocols may be needed depending on research/clinical objectives, such as in observational studies that aim to detect subtle intraindividual change in cognitive functioning.
Construct validity was assessed by examining correspondence between performance on the ambulatory cognitive tests and on in-lab NIH Toolbox measures and via comparing scores in the FM and non-FM groups. Regarding the symbol search task, the strongest associations were, as expected, with scores on the NIH Toolbox test of processing speed (i.e., pattern comparison), with correlations of moderate to strong magnitude observed in both the FM and non-FM controls. These data support the convergent validity of the symbol search test. However, it is notable that significant correlations of low to moderate magnitude were also observed between the symbol search test and the other NIH Toolbox tests (i.e., list sorting, flanker, and dimensional change card sort). This raises the possibility that a common factor, such as fatigue or inattention, may have impacted performance across tests.
To our surprise, known-groups analysis showed no significant group differences in average symbol search performance, despite participants with FM producing lower NIH Toolbox pattern comparison test T-scores than did non-FM controls. Although there was correlational evidence for convergence between the ambulatory and in-lab tests of processing speed, the ambulatory symbol search test used here may be less sensitive than the NIH Toolbox pattern comparison test in detecting group differences in performance. This is particularly likely if processing speed deficits are modest, as has been shown to be the case in a meta-analysis of cognitive performance in FM (Bell et al., Reference Bell, Trost, Buelow, Clay, Younger, Moore and Crowe2018). As task difficulty is an important determinant of between-person variability in scores, it is possible that a more challenging version of the test, such as one with more complex stimuli (e.g., higher degree of similarity between stimuli in the search group, higher number of symbol pairs to consider), would have better distinguished between the groups. Alternatively, the presence of significant FM vs. non-FM differences in NIH Toolbox test performance, but lack of significant group differences in performance on the ambulatory test, could be due to the influence of FM symptoms. FM-related fatigue and attentional difficulties might impact performance on the NIH Toolbox tests to a greater degree given the tests’ longer duration, thus contributing to larger FM vs. non-FM differences in performance.
There was also mixed evidence for the validity of the dot memory test. Group differences were consistent with expectations: participants with FM, relative to non-FM controls, produced significantly worse dot memory scores and T-scores on the NIH Toolbox measure of working memory (i.e., list sorting). However, within both the FM and non-FM groups, performance on the ambulatory test was only modestly correlated with the NIH Toolbox test, and within the FM group, similar magnitude correlations were observed between dot memory performance and scores on the NIH Toolbox tests of processing speed (i.e., pattern comparison), attention and inhibitory control (i.e., flanker), and cognitive flexibility and attention (i.e., dimensional change card sort). Associations between dot memory scores and performance on tests of processing speed and attention are not surprising; slowed processing speed constrains performance in other cognitive domains (Salthouse, Reference Salthouse1996), and attention is critical for the selection and maintenance of information in working memory (Awh et al., Reference Awh, Vogel and Oh2006; Oberauer, Reference Oberauer2019).
The availability of psychometrically sound ambulatory cognitive tests has implications for assessment of cognitive functioning in FM and other groups. Ambulatory tests offer a means of assessing cognitive performance in the lived environment and on repeated occasions, therefore addressing key limitations and criticisms of traditional neuropsychological testing, including a lack of ecological validity and measurement reliability. Our own work and that of others have demonstrated clear clinical value in the advantages of ambulatory testing. We previously showed that momentary changes in processing speed in individuals with FM, as assessed by repeat ambulatory testing, correspond with momentary subjective reports of cognitive functioning (Kratz et al., Reference Kratz, Whibley, Kim, Sliwinski, Clauw and Williams2020), giving credence to perceptions of cognitive difficulties in the daily lives of individuals with FM. In a study of healthy older adults (Allard et al., Reference Allard, Husky, Catheline, Pelletier, Dilharreguy, Amieva, Pérès, Foubert-Samier, Dartigues and Swendsen2014), performance on repeat ambulatory cognitive testing, but not on traditional neuropsychological tests, was significantly correlated with hippocampal volume, suggesting that the former may detect subtle cognitive deficits that traditional neuropsychological testing might miss. Moreover, engagement in intellectually stimulating activities, including reading and completing crossword puzzles, was shown to precede improved performance on ambulatory cognitive tests in the subsequent three-hour period, demonstrating the utility of this measurement approach for examining real-time, dynamic associations between daily life activities or behaviors and cognitive functioning.
Study limitations and future research directions are considered. The present investigation focused on select cognitive domains – processing speed and working memory – though ambulatory tests of other cognitive domains impacted by FM also merit study. Regarding the breadth of psychometric evaluation, this study examined feasibility as indicated by compliance, between-person reliability, and construct validity. As work continues in this area, additional qualities merit consideration. The evaluation of feasibility may be expanded to include sociodemographic and clinical characteristics that impact compliance and data quality. Data regarding attitudes, perceptions, and acceptability could inform methodological improvements, and examining sensitivity to change and correlation with neuroanatomical, neurophysiological, and functional outcomes could improve clinical utility. Though the study’s use of established neuropsychological measures to evaluate construct validity is a strength, it should be acknowledged that factors not considered here (e.g., environmental distractions, test anxiety) may have differentially impacted performance in the in-lab and naturalistic settings, thus affecting the measure convergence. We are unable to determine with these data the extent to which ambulatory cognitive test scores relate to performance on everyday cognitive tasks. This is an important area for future inquiry, as establishing correspondence between ambulatory test performance and real-world functioning will improve the clinical and research utility of these measures. There are several possible explanations for the differential patterns of correlation between the ambulatory tests and NIH Toolbox tests in the FM vs. non-FM groups, including the small sample size or restricted range of cognitive performance, particularly in non-FM controls. Future studies employing larger, more diverse samples with greater variability in cognitive functioning are needed to replicate the findings, as well as to provide demographically adjusted normative data. Additionally, as the NIH Toolbox tests were only administered at baseline, we are unable to examine how changes in performance with repeat ambulatory testing compare with changes in performance with serial administration of the NIH Toolbox measures, including examining how practice effects might differ across these types of measures. Finally, test sessions with < 70% accuracy were excluded from analyses. Although this was done to reduce potential bias from data resulting from poor effort, it is possible that informative data were inadvertently excluded. As ambulatory methods continue to be refined, an important area for future investigation will be evaluating methods for assessing and controlling for suboptimal examinee effort.
In sum, the findings show promise for use of ambulatory approaches to assessing cognitive performance in people with FM. Administering brief, repeated smartphone-based cognitive tests in the everyday lives of individuals with FM is feasible and can produce reliable scores with few measurement occasions. Further development of ambulatory cognitive tests that are feasible, reliable, and valid remains an important area for future research in FM and other clinical populations.
Funding statement
This work was supported by the National Institute of Arthritis and Musculoskeletal and Skin Diseases of the NIH (A.L.K., grant number K01-AR-064275); the National Multiple Sclerosis Society’s Mentor-Based Postdoctoral Fellowship Program in Rehabilitation Research (A.L.K., grant number MB-1706-27943); and the Michigan Institute for Clinical and Health Research (grant number UL1-TR-002240), which provided subject recruitment support through UMHealthResearch.org.
Conflicts of interest
None.