Introduction
The use of telehealth (i.e., audio and videoconferencing to deliver healthcare) is rapidly growing, especially since the coronavirus disease (COVID-19) pandemic (Wosik et al., Reference Wosik, Fudim, Cameron, Gellad, Cho, Phinney, Curtis, Roman, Poon, Ferranti, Katz and Tcheng2020). Within the field of neuropsychology, an advocacy team established by the Inter Organizational Practice Committee has been providing up-to-date recommendations and guidelines on the use of teleneuropsychological assessments (TNP) to clinicians (Bilder et al., Reference Bilder, Postal, Barisa, Aase, Cullum, Gillaspy, Harder, Kanter, Lanca, Lechuga, Morgan, Most, Puente, Salinas and Woodhouse2020). Meanwhile, a host of training resources and virtual seminars/workshops have been disseminated across neuropsychological organizations, highlighting the rapid and immense need for knowledge within the context of neuropsychological research in this relatively new territory.
Telehealth allows for increased access to health care services, especially among persons with chronic illnesses and disabilities (Lillicrap et al., Reference Lillicrap, Hunter and Goldswain2019). It decreases barriers to accessing appropriate care (e.g., lack of transportation, financial constraints, stigmatization) and can allow for patients needing specialty care to access providers despite geographical location (Gajarawala & Pelkowski, Reference Gajarawala and Pelkowski2021; Moffatt & Eley, Reference Moffatt and Eley2010; Speedie et al., Reference Speedie, Ferguson, Sanders and Doarn2008). These advantages are especially pertinent given that already vulnerable populations (e.g., people with HIV; PWH) could be more susceptible to contracting COVID-19 and/or face more adverse health outcomes once infected (Mirzaei et al., Reference Mirzaei, McFarland, Karamouzian and Sharifi2020).
Telehealth is generally well-accepted by patients, providers, and families (Parikh et al., Reference Parikh, Grosch, Graham, Hynan, Weiner, Shore and Cullum2013; Parsons et al., Reference Parsons, Gardner, Sherman, Pasquariello, Grieco, Kay, Pollak, Morgan, Carlson-Emerton, Seligsohn, Davidsdottir, Pulsifer, Zarrella, Burstein and Mancuso2021; Shore, Reference Shore2013), with patient reports of up to 98% satisfaction with videoconferencing and little concerns regarding privacy. Furthermore, some patients found videoconferencing more enjoyable and less anxiety-inducing in their naturalistic environment (Parikh et al., Reference Parikh, Grosch, Graham, Hynan, Weiner, Shore and Cullum2013). Despite the potential for decreased emotional connection compared to in-person assessments (IPA), videoconferencing has been found to provide similar personal interactions and possibly more frequent appointments (Bloem et al., Reference Bloem, Dorsey and Okun2020; Marra et al., Reference Marra, Hamlet, Bauer and Bowers2020). With regard to TNP, patients and clinicians have found TNP evaluations acceptable and feasible during the COVID-19 pandemic and reported several favorable features, including saved travel time, reduced risk of COVID-19 exposure, and reduced concentration difficulties (Parsons et al., Reference Parsons, Gardner, Sherman, Pasquariello, Grieco, Kay, Pollak, Morgan, Carlson-Emerton, Seligsohn, Davidsdottir, Pulsifer, Zarrella, Burstein and Mancuso2021).
Applications of TNP suggest a strong agreement between TNP and IPA across a variety of populations (e.g., older adults, patients with multiple sclerosis, cognitive impairment, psychiatric conditions, cerebrovascular accident) (Barcellos et al., Reference Barcellos, Horton, Shao, Bellesis, Chinn, Waubant and Schaefer2021; Cullum et al., Reference Cullum, Hynan, Grosch, Parikh and Weiner2014; Marra et al., Reference Marra, Hamlet, Bauer and Bowers2020; Matchanova et al., Reference Matchanova, Babicz, Medina, Rahman, Johnson, Thompson, Beltran-Najera, Brooks, Sullivan, Walker, Podell and Woods2020; Tailby et al., Reference Tailby, Collins, Vaughan, Abbott, O’Shea, Helmstaedter and Jackson2020; Wadsworth et al., Reference Wadsworth, Dhima, Womack, Hart, Weiner, Hynan and Cullum2018). Among cognitively impaired and non-impaired participants, scores on neuropsychological measures across domains (i.e., memory, attention, verbal fluency, language, executive function) between TNP (administered in-clinic) and in-person conditions were highly concordant (Cullum et al., Reference Cullum, Hynan, Grosch, Parikh and Weiner2014). Furthermore, a recent systematic review (19 studies) of TNP validity concluded no significant effect related to video-administration of certain cognitive screeners (i.e., MMSE, MoCA), language tests (i.e., Boston Naming Test, Letter Fluency), attention/working memory tasks (i.e., Digit Span Total), and memory tests (Hopkins Verbal Learning Test-Revised) (Brearly et al., Reference Brearly, Shura, Martindale, Lazowski, Luxton, Shenal and Rowland2017; Marra et al., Reference Marra, Hamlet, Bauer and Bowers2020). Majority of these studies conducted the TNP assessments over desktop/laptop (Marra et al., Reference Marra, Hamlet, Bauer and Bowers2020). According to The Inter Organizational Practice Committee, Pearson recommends a display size of at least 9.75 on the patient side, although few studies have examined differences in TNP validity across different device types and sizes. Passell et al. (Reference Passell, Strong, Rutter, Kim, Scheuer, Martini, Grinspoon and Germine2021) found an association between device group and reaction time such that measures with more complex stimuli and responses (e.g., Trails A & B) were most affected by screen size. Together, these studies suggest that TNP is reliable and valid for various populations, and validity across various device types remains to be established.
To our knowledge, studies have yet to examine the reliability and comparability of remote TNP assessments versus traditional IPA among PWH. Previous TNP validations studies support the administration of TNP across multiple domains (e.g., memory, attention, executive function) that may be particularly relevant to assess among PWH (Becker et al., Reference Becker, Caldararo, Lopez, Dew, Dorst and Banks1995; Heaton et al., Reference Heaton, Franklin, Deutsch, Letendre, Ellis, Casaletto, Marquine, Woods, Vaida, Atkinson, Marcotte, McCutchan, Collier, Marra, Clifford, Gelman, Sacktor, Morgello, Simpson and Teshome2015). Therefore, the cognitive effects of HIV may be amenable to TNP assessment.
Considering the COVID-19 pandemic, there is an immediate need for psychometrically sound neuropsychological assessments that can be administered while patients are at home, to maintain clinical neuropsychological care and ongoing research studies. However, several questions need to be answered before neuropsychologists can confidently use and interpret results from TNP evaluations. The current study will address two primary questions: How comparable are test scores obtained by TNP versus IPA? Is test reliability affected by mode of administration?
The specific purposes of this study are to (1) compare test-retest reliabilities between PWH and those without HIV (HIV−) at the participants’ two most recent IPA (IPA1 and IPA2), and between in-person assessments and at-home TNP assessments, and (2) assess performance-level differences in neuropsychological assessment scores between IPA1 and IPA2, and between in-person and at-home TNP assessments. We hypothesized strong test-retest correlations and minimal performance-level differences in our samples of PWH and HIV−. We will additionally investigate the potential effects of technical aspects related to the remote assessment (e.g., participant device type, interruptions to testing) environment on raw scores at the TNP evaluation.
Methods
Participants
Participants included 80 PWH (M age = 58.7, SD age = 11.0) and 23 HIV− (M age = 61.9, SD age = 16.7) individuals who were enrolled in NIH-funded studies at the University of California, San Diego (UCSD) HIV Neurobehavioral Research Program (HNRP) from 2005 to 2020, demonstrated capacity to consent, and provided written informed consent. All study procedures were approved by the UCSD Institutional Review Board and are in accordance with the Helsinki Declaration.
The current study is a secondary analysis of data from each participant’s TNP evaluation, IPA1, and IPA2. Inclusion criteria were (1) age 18 years or older; (2) ability to provide informed consent; (3) negative urine toxicology for illicit drugs (excluding marijuana) or negative Breathalyzer test for alcohol on the day of the in-person study visits; (4) at least two completed IPAs; (5) greater than three months between the two prior IPAs; and (6) one remote TNP evaluation completed between March 2020 and December 2020. Exclusion criteria at in-person and TNP evaluations were consistent among all parent studies and included (1) psychotic disorder diagnosis; (2) history of a non-HIV related neurological condition known to impact neurocognitive functioning (e.g., stroke, head injury with neurological complications; epilepsy); and (3) non-HIV related medical conditions associated with neurocognitive disorders. To determine language of test administration, participants were asked to self-report how well they spoke English and Spanish using a Likert-type scale (0 = not well to 3 = very well). Participants reporting equal scores for each language were tested in their preferred language (Mungas et al., Reference Mungas, Reed, Crane, Haan and Gonzalez2004). Ninety-six participants were evaluated in English (93.2%). Given that each participant is their own control, we considered it appropriate to include the Spanish speakers in analyses.
In-person psychiatric and neuromedical evaluation
The Composite International Diagnostic Interview (v2.1) was administered at the IPAs to assess current (i.e., past 12 months) and lifetime (i.e., >12 months ago) mood and substance use disorders (World Health Organization, 1997). The parent grants were funded before the publication of the DSM 5; therefore, diagnoses were made based on the DSM-IV criteria. HIV serostatus was determined using ELISA/Western blot by a CLIA-certified reference lab.
In-person neuropsychological assessment
Participants were administered a well-validated and comprehensive battery of neuropsychological assessments measuring seven cognitive domains: verbal fluency, executive function, processing speed, learning, delayed recall, working memory, and motor skills (Carey et al., Reference Carey, Woods, Gonzalez, Conover, Marcotte, Grant and Heaton2004). This battery was designed in accordance with the international consensus conference recommendations (i.e., Frascati criteria) for HIV-associated neurocognitive disorder (Antinori et al., Reference Antinori, Arendt, Becker, Brew, Byrd, Cherner, Clifford, Cinque, Epstein, Goodkin, Gisslen, Grant, Heaton, Joseph, Marder, Marra, McArthur, Nunn, Price and Wojna2007; Heaton et al., Reference Heaton, Clifford, Franklin, Woods, Ake, Vaida, Ellis, Letendre, Marcotte, Atkinson, Rivera-Mindt, Vigil, Taylor, Collier, Marra, Gelman, McArthur, Morgello, Simpson and Grant2010).
Teleneuropsychological evaluation setup
TNP evaluations were conducted using HIPAA-compliant Zoom with both the examiner and participants using their personal devices in their respective home environments. To the extent possible, examiner setups were standardized. Examiner standardizations included use of a virtual private network; collection of participant responses with an iPad and stylus; secluded setting to minimize interruptions; computers with a camera and microphone. Because examiner screens are shared with the participant, all computer notifications were disabled. To protect participant privacy, Zoom meeting rooms were password-protected and examiners used headphones. All examiners operated from the same video-based platform; however, internet connection quality and computer hardware varied between examiners.
Because TNP evaluations were conducted in participants’ naturalistic environments, participant standardizations were limited. HNRP schedulers recommended that participants find a private, quiet location in their homes, sit at a desk or table, wear headphones to improve audio quality and ensure confidentiality, and use a device with video capabilities for visual measures. Since June 2020, participants who did not have suitable home environments for TNP testing were provided the option of using a testing room (adhering to social distancing guidelines) at the HNRP for their TNP evaluation. Participants who connected by landline telephone received audio-only measures; participants who connected by tablet or personal computer received audio and visual measures; and participants who connected by smartphone received audio and visual measures, one of which needed to be adapted to conform to Zoom’s non-adjustable mobile settings (i.e., Stroop Color and Word Test). Prior to testing, examiners administered a Remote Visit Questionnaire to assess participant testing environment (e.g., privacy, device used). After testing, examiners completed a second section of the Remote Visit Questionnaire to retrospectively capture the signal/connection quality during testing, audio quality, and interruptions during testing. Interruptions to testing were considered as any auditory or visual distraction that could influence neuropsychological performance (e.g., examinee’s phone ringing, dog barking, family member speaking, garbage truck reversing).
Pre-testing sequence for the teleneuropsychological evaluation
Prior to beginning the TNP evaluation, participants received a brief introduction about the TNP procedure. To reduce the potential for distraction from self-view and video of the examiner, the video panel was minimized for participants only, leaving only test materials visible. Participants provided updated neurobehavioral and substance use histories for the interval between the prior IPA and the TNP visit. Examiners assessed substance use (e.g., alcohol, marijuana, methamphetamine) quantity since the previous IPA and lifetime quantity. Suspected intoxication at the TNP evaluation was indicated in behavioral notes.
Participants completed the Profile of Mood States (POMS), a 65-item self-report measure of mood (i.e., tension–anxiety, depression, anger–hostility, fatigue, confusion, vigor) over the previous seven days (McNair et al., Reference McNair, Lorr and Droppleman1981).
Teleneuropsychological assessment battery
A comparison of the in-person and TNP batteries is presented in Table 1. Inclusion criteria for neurocognitive measures in the TNP battery were (a) brevity, (b) common use among HNRP studies, and (c) suitability for video-administration and response recording. Measures included Hopkins Verbal Learning Test-Revised (HVLT-R Total and Delayed Recall) (Benedict et al., Reference Benedict, Schretlen, Groninger and Brandt1998; Diaz-Santos et al., Reference Díaz-Santos, Suárez, Marquine, Umlauf, Rivera Mindt, Artiola i Fortuny, Heaton and Cherner2021), Controlled Oral Word Association Test (COWAT; FAS, PMR) and Category (Animal) Fluency (Borkowski et al., Reference Borkowski, Benton and Spreen1967; Marquine, Morlett Paredes, et al., Reference Marquine, Morlett Paredes, Madriaga, Blumstein, Umlauf, Kamalyan, Rivera Mindt, Suarez, Artiola i Fortuni, Heaton and Cherner2021), Action (Verb) Fluency (Woods et al., Reference Woods, Scott, Sires, Grant, Heaton and Tröster2005), Wechsler Adult Intelligence Scale 3rd Edition (WAIS-III) Symbol Search and Letter Number Sequencing, Stroop Color and Word Test (Gooding et al., Reference Gooding, Seider, Marquine, Suarez, Umlauf, Rivera Mindt and Cherner2021; Rivera Mindt et al., Reference Rivera Mindt, Marquine, Aghvinian, Scott, Cherner, Morlett Paredes, Taylor, Umlauf, Suarez, Diaz-Santos, Kamalyan, Heaton, Artiola i Fortuny and Heaton2021; Stroop, Reference Stroop1935), Paced Auditory Serial Addition Test (PASAT – Channel 1) (Diehr et al., Reference Diehr, Heaton, Miller and Grant1998; Gooding et al., Reference Gooding, Seider, Marquine, Suarez, Umlauf, Rivera Mindt and Cherner2021), and the 60-item version of the Boston Naming Test (BNT; excluding item #48) (Kaplan et al., Reference Kaplan, Goodglass and Weintraub1983).
Note. WAIS-III = Wechsler Adult Intelligence Scale, Third Edition; WMS-III = Wechsler Memory Scale, Third Edition.
a Not administered to every participant in-person.
Individual test raw scores were converted into demographically adjusted (i.e., age, sex, education, race/ethnicity) T-scores (M = 50, SD = 10 in healthy subjects) (Antinori et al., Reference Antinori, Arendt, Becker, Brew, Byrd, Cherner, Clifford, Cinque, Epstein, Goodkin, Gisslen, Grant, Heaton, Joseph, Marder, Marra, McArthur, Nunn, Price and Wojna2007; Cherner et al., Reference Cherner, Marquine, Umlauf, Morlett Paredes, Rivera Mindt, Suárez, Yassai-Gonzalez, Kamalyan, Scott, Heaton, Diaz-Santos, Gooding, Artiola i Fortuny and Heaton2021; Heaton et al., Reference Heaton, Miller, Taylor and Grant2004; Heaton et al., Reference Heaton, Taylor and Manly2003). Individual neuropsychological tests were considered impaired when T-scores < 40 (Taylor & Heaton, Reference Taylor and Heaton2001).
Given that neuropsychological assessments were designed to be administered in a face-to-face and in-person format, a few accommodations were made to facilitate TNP administration. First, considering internet and audio quality varies between participants, all TNP assessment instructions were presented both orally and visually. In the TNP format, verbal tasks (e.g., HVLT-R), and tasks that rely on verbal responses to visually presented stimuli (e.g., BNT; visual presentation of stimuli via shared screen) are administered similarly to in-person. Three tasks that require visual stimuli or physical interaction with stimuli were reformatted (i.e., stimuli presentation via screen share instead of booklet, verbal response instead of motor) to be included in the video format. Comparisons between IPA and TNP administration of WAIS-III Symbol Search, Stroop Color and Word Test, and PASAT – Channel 1 are described in Supplementary Table 1.
Statistical Analyses
HIV group differences on demographic characteristics were compared using independent t-tests and Chi-square statistics as appropriate. Raw scores were used for primary analyses. To examine test-retest reliability between IPA1 and IPA2, Pearson’s correlation coefficients (r) were used for normally distributed scores and Spearman’s rho correlations for scores with skewed distributions. If raw scores on IPA1 and IPA2 were highly correlated (r or ρ > .500, p < .05), a mean in-person score was calculated for each neuropsychological test to represent average in-person performance (IPA-M) (Hemphill, Reference Hemphill2003). Correlation coefficients were calculated for normally distributed scores and Spearman’s rho correlations were used for scores with skewed distributions to examine test-retest reliability between IPA-M and TNP in the total sample and by HIV status. Paired t-tests were used to compare performance-level differences for normally distributed raw scores and Wilcoxon signed rank for skewed distributions between (1) IPA1 and IPA2; and (2) IPA-M and TNP in the total sample and by HIV status. Benjamini-Hochberg procedure was applied to correct for multiple comparisons using a false discovery rate of 0.05. Follow-up analyses using matched paired t-tests examined differences in T-scores between IPA2 and TNP to account for the effects of age, sex, education, and race/ethnicity.
Descriptive statistics from three of the Remote Visit Questionnaire items were calculated. Exploratory analyses were conducted to investigate the potential effects of technical aspects of the remote assessment environment on raw scores at the TNP evaluation. Linear regressions were used to examine the association between device type and change in raw scores from IPA2 to TNP (change score = TNP − IPA2). One-way analysis of variance (ANOVA) and Tukey’s honest significant difference (HSD) tests were used to compare TNP raw scores between the four device types. A t-test was used to examine the effects of examiner reported interruptions to testing on TNP raw scores. Statistical analyses were performed using JMP Pro version 14.0.0 (JMP®, Version <14.0.0>. SAS Institute Inc., Cary, NC, 1989–2007).
Results
Demographic characteristics by HIV serostatus
Demographic characteristics at the TNP evaluation by HIV group are presented in Table 2. PWH had significantly fewer years of education and higher rates of lifetime substance use disorder (ps < .05) than HIV−. The groups did not differ on age, sex, ethnicity, Wide Range Achievement Test 4 Reading, lifetime diagnosis Major Depressive Disorder, and mood (ps ≥ .09). All participants completed two IPAs (days apart: M = 577, SD = 716; Mdn = 365, IQR = 244–583; range = 108–3970) and one remote TNP evaluation (days apart from IPA2: M = 414, SD = 238; Mdn = 375, IQR = 277–465; range = 112–1655). 49.5% of participants had re-tests more than 1 year apart between IPA1 and IPA2. 55.3% of participants had re-tests more than 1 year apart between IPA2 and TNP.
Note. WRAT4 Reading = Wide Range Achievement Test; POMS = Profile of Mood States; Values are presented as M (SD) or Mdn [IQR]. Bolded values indicate p < .05; PWH = people with HIV
a N = 62, administered at the first in-person visit only, not administered to Spanish speakers.
b N = 92.
c Defined as >50 copies/mL in plasma.
d N = 40.
Test-retest reliability of neuropsychological assessments
Results of correlation analyses are presented in Table 3. There were statistically significant correlations between IPA1 and IPA2 (r or ρ = .603–.883, mdn = .744, ps < .001) and between IPA-M and TNP (r or ρ = .622–.958, mdn = .801, ps < .001) across all neuropsychological assessment raw scores.
Note. COWAT = Controlled Oral Word Fluency Test; WAIS-III = Wechsler Adult Intelligence Scale, Third Edition; PASAT = Paced Auditory Serial Addition Test (Channel 1); HVLT-R = Hopkins Verbal Learning Test-Revised; Bolded values indicate p < .05.
Correlations between IPA1 and IPA2 (Table 4) were statistically significant across neuropsychological assessment raw scores in PWH (r or ρ = .596–.871, mdn = .737, ps < .001) and HIV− groups (r or ρ = .556–.943, mdn = .826, ps < .05). Correlations between IPA-M and TNP (Table 5) were statistically significant across neuropsychological assessment raw scores in PWH (r or ρ = .631–.960, mdn = .820 ps < .001) except for the BNT (ρ = .593, p = .122), and in the HIV− group (r or ρ = .593–.967, mdn = .855, ps < .05). Correlations between IPA-M and TNP in the HIV− group for COWAT PMR and WAIS-III Letter Number Sequencing were not calculated due to small sample size (n ≤ 5).
Note. COWAT = Controlled Oral Word Fluency Test; WAIS-III = Wechsler Adult Intelligence Scale, Third Edition; PASAT = Paced Auditory Serial Addition Test (Channel 1); HVLT-R = Hopkins Verbal Learning Test-Revised; Bolded values indicate p < .05.
Note. COWAT = Controlled Oral Word Fluency Test; WAIS-III = Wechsler Adult Intelligence Scale, Third Edition; PASAT = Paced Auditory Serial Addition Test (Channel 1); HVLT-R = Hopkins Verbal Learning Test-Revised; Bolded values indicate p < .05.
a n = 4.
Performance-level differences between neuropsychological assessments
Results of matched paired t-tests or Wilcoxon Signed Rank test between IPA1 and IPA2 are presented in Table 6. There were no significant differences between raw scores on IPA1 and IPA2 (ps ≥ .012; Benjamini-Hochberg procedure cut off of 0.05 required). Results examining performance-level differences between IPA-M and TNP are presented in Table 7. There were significantly lower raw scores at TNP compared to IPA-M on COWAT (PMR) (t(7) = −3.7, p = .007), Stroop Word (t(49) = −6.1, p < .001), Stroop Color (t(48) = −3.9, p < .001), Stroop Incongruent (t(49) = −2.8, p = .006), WAIS-III Letter Number Sequencing (t(27) = −2.8, p = .010), and HVLT-R Total Recall (t(81) = −3.6, p < .001). Cohen’s effect size values for Stroop Color (d z = .529), Stroop Word (d z = .387), WAIS-III Letter Number Sequencing (d z = .533) and HVLT-R Total Recall (d z = .375) suggest low to moderate practical significance. Effect size values for COWAT (PMR) (d z = 1.11) and Stroop Word (d z = .867) suggest high practical significance.
Note. Values are presented as M (SD). COWAT = Controlled Oral Word Fluency Test; WAIS-III = Wechsler Adult Intelligence Scale, Third Edition; PASAT = Paced Auditory Serial Addition Test (Channel 1); HVLT-R = Hopkins Verbal Learning Test-Revised; Bolded values indicate p < .05.
a Results considered not significant after applying Benjamini-Hochberg procedure (false discovery rate of 0.05).
Note. Values are presented as M (SD). COWAT = Controlled Oral Word Fluency Test; WAIS-III = Wechsler Adult Intelligence Scale, Third Edition; PASAT = Paced Auditory Serial Addition Test (Channel 1); HVLT-R = Hopkins Verbal Learning Test-Revised; Bolded values indicate p < .05.
a Results considered not significant after applying Benjamini-Hochberg procedure (false discovery rate of 0.05).
In PWH and the HIV− group, there were no significant differences between raw scores on IPA1 and IPA2, after correcting for multiple comparisons (ps > .05). In PWH, there were significantly lower raw scores on TNP assessments compared to IPA-M on COWAT (PMR) (t(7) = −3.7, p = .007), Stroop Word (t(38) = −4.6, p < .001), Stroop Color (t(37) = −3.0, p = .004 ), and HVLT-R Total Recall (t(61) = −3.7, p < .001) (Figure 1). In the HIV− group, there were lower raw scores on the TNP Stroop Word test compared to IPA-M (t(10) = −5.8, p < .001) (Figure 2).
Follow-up analyses using matched paired t-tests examining differences in T-scores between IPA2 and TNP showed significantly lower T-scores at the TNP assessment on Stroop Word (TNP− IPA2 = −5.72; t(57) = −5.86, p < .001), Stroop Color (TNP − IPA2 = −2.7; t(56) = −3.26, p = .002), and HVLT-R Total Recall (TNP − IPA2 = −3.65; t(92) = −3.12, p = .002). Neuropsychological test scores were considered impaired when T-score < 40. There were significantly more impaired scores in the TNP evaluation compared to IPA2 on Stroop Word (IPA2 = 18 (31%), TNP = 29 (50%); p < .001), Stroop Color (IPA2 = 19 (33%), TNP = 22 (39%); p < .001), and HVLT-R Total Recall (IPA2 = 39 (42%), TNP = 57 (61%); p < .001). On the Stroop Word test, 24% of participants went from an unimpaired to impaired score at the TNP evaluation and 5% went from impaired to unimpaired. On the Stroop Color test, 7% of participants went from an unimpaired to impaired score at the TNP evaluation and 2% went from impaired to unimpaired. On the HVLT-R Total Recall, 26% of participants went from an unimpaired to impaired score at the TNP evaluation and 6% went from impaired to unimpaired.
Additional follow-up analyses were conducted to examine the potential effects of administration language. Seven participants were excluded that were tested in Spanish. Results of matched paired t-tests were consistent with the total sample, with observed differences in raw scores on Stroop Word, Stroop Color, Stroop Incongruent, WAIS-III Letter Number Sequencing, and HVLT-R Total Recall (ps ≤ .01).
Remote visit questionnaire results
Device type, assessment type, and participant environment were evaluated from the Remote Visit Questionnaire. Of the total sample, 92 participants (89%) completed the Remote Visit Questionnaire. The most common device type used in the TNP evaluation was a smartphone (39%), followed by laptop/desktop (34%), tablet (14%), and traditional telephone (8%) (5% not documented). Audio-only assessments were not limited to only via traditional telephone. Over 75% of the TNP evaluations were conducted using both video and audio. 21% of participants were interrupted at least once during the TNP evaluation (ex. “cathedral bells caused some disruption to participant’s attention span”). Results of a t-test comparing raw scores by participant interruption status revealed significant differences on the HVLT-R Total Recall such that participants performed worse when there were interruptions during testing (M = 17.8; SD = 5.1) compared to no interruptions (M = 22.0; SD = 6.1; t(85) = −2.67, p = .009). Additional follow-up analyses were conducted to examine the potential effects of interruptions to testing. Nineteen participants were excluded (16 PWH, 3 HIV−) that had interruptions to testing. Results of matched paired t-tests revealed differences in raw scores on Stroop Word, Stroop Color, Stroop Incongruent, and HVLT-R Total Recall (ps < .05).
In the exploratory analyses, results revealed no significant association between device used at the TNP evaluation and change in performance from IPA2 to TNP. An ANOVA showed a significant omnibus difference across device type groups in TNP raw scores on Category Fluency – Animals (F(3,81) = 4.13, p = .008). Follow-up pairwise comparisons showed poorer performance when administered via telephone compared to smartphone, tablet, and laptop/desktop (ps < .05).
Discussion
Results of this study add to a growing body of literature demonstrating that TNP assessments are reliable and valid across diverse populations and during the COVID-19 pandemic (Barcellos et al., Reference Barcellos, Horton, Shao, Bellesis, Chinn, Waubant and Schaefer2021; Brearly et al., Reference Brearly, Shura, Martindale, Lazowski, Luxton, Shenal and Rowland2017; Cullum et al., Reference Cullum, Hynan, Grosch, Parikh and Weiner2014; Marra et al., Reference Marra, Hamlet, Bauer and Bowers2020; Matchanova et al., Reference Matchanova, Babicz, Medina, Rahman, Johnson, Thompson, Beltran-Najera, Brooks, Sullivan, Walker, Podell and Woods2020). Among our sample of PWH and HIV−, we established test-retest reliability between two IPAs that were approximately 1 year apart; and found significant and moderate to strong correlations between participants’ IPA and TNP evaluations. Performance-level differences between IPA and TNP had variable effect sizes with small to moderate effect sizes for Stoop Color, Stroop Word, WAIS-III Letter Number sequencing, and HVLT-R Total Recall; and large effect sizes for COWAT (PMR) and Stroop Word. Importantly, there was only a small sample of participants completed the COWAT (PMR). There were lower raw scores on the Stroop Color and Word Test, COWAT (PMR), WAIS-III Letter Number Sequencing, and HVLT-R Total Recall at the TNP evaluation. Accounting for the effects of age, sex, education, and race/ethnicity, results indicated significant mean differences in T-scores on Stroop Word, Stroop Color, and HVLT-R Total Recall with the greatest T-score point difference on the Stroop Word test (TNP − IPA2 = −5.72). Across these three neuropsychological assessments, there were more participants that went from an unimpaired T-score at IPA2 to an impaired score at the TNP evaluation; however, several participants with impaired scores at IPA2 performed in the unimpaired range at the TNP evaluation. This could be attributable to practice effects, considering some participants had evaluations less than a year after their previous one; however, we might expect more participants to show improvement if there were significant practice effects (Dikmen et al., Reference Dikmen, Heaton, Grant and Temkin1999).
Despite statistical significance, differences in raw scores and T-scores between IPA-M and TNP on the COWAT (PMR), WAIS-III Letter Number Sequencing, and HVLT-R Total Recall were minimal and possibly due to factors associated with COVID-19 (e.g., COVID-19 infection, stress, depression, social isolation), the TNP platform, or factors on the day of testing (e.g., pain, poor sleep, energy). (Hampshire et al., Reference Hampshire, Trender, Chamberlain, Jolly, Grant, Patrick and Mehta2021; Suarez-Gonzalez et al., Reference Suarez-Gonzalez, Rajagopalan, Livingston and Alladi2021). The marginal difference in raw scores from the in-person to TNP evaluation observed on the HVLT-R Total Recall (mean difference = −1.8) is consistent with another study which found poorer performance on the HVLT-R Total Recall in the video-based condition (mean difference = −2.11) among mildly impaired stroke patients (Chapman et al., Reference Chapman, Gardner, Ponsford, Cadilhac and Stolwyk2020). The authors attribute this possibly to mishearing words in the TNP condition or participant anxiety with the TNP scenario. Results from our study indicate that interruptions to testing were significantly associated with worse performance on the HVLT-R Total Recall and significant differences remained even after excluding participants that experienced interruptions. Therefore, differences observed on TNP tests could be attributable to other technical aspects of the TNP environment. For example, audio glitches may affect participant’s understanding of task instructions, ability to clearly hear verbal stimuli, and adequate response collection by examiners (Gardner et al., Reference Gardner, Aslanzadeh, Zarrella, Braun, Loughan and Parsons2021). These testing environment characteristics may be important to capture in TNP practice to understand whether a poor performance may reflect change in neurocognitive function or limitations of videoconferencing.
Few studies have investigated the reliability and validity of the Stroop Color and Word Test, or a similar response inhibition measure, in the TNP setting. One study among a pediatric sample (aged 6–20) found no significant performance-level differences on the DKEFS Color Word Interference Test between in-person and home-based TNP assessment administered via tablet or laptop (Harder et al., Reference Harder, Hernandez, Hague, Neumann, McCreary, Cullum and Greenberg2020). Another study among middle-to-older adults (aged 40–86) suggests moderate correlations between in-person and TNP assessments via desktop on the Stroop Color and Word Test (Zeghari et al., Reference Zeghari, Guerchouche, Tran-Duc, Bremond, Langel, Ramakers, Amiel, Lemoine, Bultingaire, Manera, Robert and Konig2022). In the current study, more pronounced differences in raw scores observed across the Stroop Color and Word Test may be attributable to limitations of administering a time-bound visual neuropsychological assessment via videoconferencing. Particularly, lag-time in communication between an incorrect response and examiner feedback could limit the remaining time in the task for correct responses. Additionally, to be administered in the TNP modality, presentation of the stimuli was reformatted to balance the number of words on each slide (i.e., 60 words/slide, 2 slides) with the number of slide changes. Communication between the participant and examiner about changing slides could have also limited opportunity for participant correct responses. Considering differences in T-scores between IPA2 and TNP were greatest for Stroop Word and progressively decreased for the remaining subtests, it is also possible participants could benefit from more practice administration to better acclimate to video-administration of this test. On average, participants T-score on Stroop Word dropped six points at the TNP evaluation and 24% of participants went from an unimpaired score at IPA2 to an impaired score at TNP. To the degree that this difference is constant across individuals, and unrelated to their last in-person score, it may be possible to create and apply a time constant correction for this neuropsychological test.
Results of an exploratory analysis revealed no significant association between device used at TNP and change in performance from IPA2 to TNP, and comparability between TNP raw scores across device types, which may be suspected to account for differences on the Stroop Color and Word Tests. Despite these findings, several challenges remain in digital neuropsychology with regard to device type and characteristics including (1) variability in the perceptual, motor, and cognitive abilities needed for response behaviors; (2) variability in hardware and software between devices that may affect stimulus presentation and response latency; and (3) rapid changes in hardware, software, and device ownership which may affect tests and test norms (Germine et al., Reference Germine, Reinecke and Chaytor2019). These challenges are being rapidly investigated. Germine et al. (Reference Germine, Reinecke and Chaytor2019) found significant differences in response behavior on a digital trail making test with examinees (aged 18–35) taking less time to complete the task on an iPad compared to a personal computer, and more time on an iPhone. Thus, the question around device type and screen size warrants reevaluation with a greater sample size and power to detect meaningful differences in the future, which could inform TNP guidelines.
Among PWH, there were significant and strong correlations between IPA and TNP evaluations. Performance-level differences between IPA and TNP were minimal and consistent with findings from the total sample. Results suggest that TNP is a reliable alternative to IPA especially in the COVID-19 pandemic, but also more broadly when considering the health burden faced by this vulnerable population (Mirzaei et al., Reference Mirzaei, McFarland, Karamouzian and Sharifi2020). PWH are living longer and are more susceptible to age related neurodegenerative diseases and functional decline (Blackstone et al., Reference Blackstone, Moore, Heaton, Franklin, Woods, Clifford, Collier, Marra, Gelman, McArthur, Morgello, Simpson, Rivera-Mindt, Deutsch, Ellis, Hampton Atkinson and Grant2012; Heaton et al., Reference Heaton, Franklin, Ellis, McCutchan, Letendre, LeBlanc, Corkran, Duarte, Clifford, Woods, Collier, Marra, Morgello, Mindt, Taylor, Marcotte, Atkinson, Wolfson, Gelman and Grant2011; Wing, Reference Wing2016). Neurocognitive and functional decline may limit the feasibility of attending in-person neuropsychological evaluations in aging PWH (Hearps et al., Reference Hearps, Schafer, High and Landay2016). During the current COVID-19 pandemic, and especially among persons who are not vaccinated, PWH may be more fearful of going into the clinic for care or even COVID-19 testing services than the general population because of chronic immune impairment (Cooper et al., Reference Cooper, Woodward, Alom and Harky2020; Fusco et al., Reference Fusco, Sangiovanni, Tiberio, Papa, Atripaldi and Esposito2020; Mirzaei et al., Reference Mirzaei, McFarland, Karamouzian and Sharifi2020). In addition to minimizing risk of additional infections, TNP evaluations have the potential to better maintain consistent access to care for PWH and provide benefits such as decreased time commitment, transportation expenses, and increase overall convenience (Gajarawala & Pelkowski, Reference Gajarawala and Pelkowski2021; Moffatt & Eley, Reference Moffatt and Eley2010; Speedie et al., Reference Speedie, Ferguson, Sanders and Doarn2008).
This study is unique from other studies examining TNP in that evaluations were typically completed at home rather than in a clinical space. Although clinical spaces provide certain testing environment standardizations (e.g., adequate internet connectivity, distraction-free), evidence suggests that cognitive performance in a naturalistic environment may be more aligned with actual cognitive functioning compared to in a clinic setting (Bloem et al., Reference Bloem, Dorsey and Okun2020; Moore et al., Reference Moore, Paolillo, Sundermann, Campbell, Delgadillo, Heaton, Swendsen and Depp2021; Rentz, Reference Rentz2016). It is important to note that TNP may not be feasible for all PWH, particularly those of the most vulnerable backgrounds (e.g., those experiencing homelessness, lower socioeconomic status, less acculturated, limited access to technology) as it introduces other potential barriers like resources to ascertain the necessary technology, need for reliable internet access, and security concerns (Bilder et al., Reference Bilder, Postal, Barisa, Aase, Cullum, Gillaspy, Harder, Kanter, Lanca, Lechuga, Morgan, Most, Puente, Salinas and Woodhouse2020; Mgbako et al., Reference Mgbako, Miller, Santoro, Remien, Shalev, Olender, Gordon and Sobieszczyk2020). Although the HNRP does not currently provide technological devices (e.g., tablet) to participants for TNP testing, participants are provided the option of using a testing room at the HNRP for their evaluation.
There are several remaining questions that could not be addressed in this study: (1) Can published normative standards available for tests administered by IPA be used for those administered by TNP? Researchers may need to develop new normative data based on this modality or create adjustments for any differences derived from the TNP testing modality. (2) Can results of the same person using these two methods be compared to measure change or neurocognitive decline? Creating regression-based change scores would require a different set of visits than we used in the current study but can be helpful in determining significant change in neurocognitive performance. (3) Can results for different people in research studies be combined if some were administered the tests with IPA and some with TNP?
The current study is not without limitations. Our sample of PWH was relatively healthy which may have increased likelihood for test-retest reliability (Heaton et al., Reference Heaton, Franklin, Deutsch, Letendre, Ellis, Casaletto, Marquine, Woods, Vaida, Atkinson, Marcotte, McCutchan, Collier, Marra, Clifford, Gelman, Sacktor, Morgello, Simpson and Teshome2015). Due to the remote administration, motor tests (e.g., Grooved Pegboard) could not be administered. This may limit sensitivity in assessing some domains commonly impaired among PWH. Next, examiners did not explicitly ask participants not to write down information during the TNP testing. Although we detected significant differences between IPA-M and TNP on four neuropsychological assessments, there could be potential unmeasured confounders (e.g., distractibility, participant screen clarity, audio glitches) that may account for the differences. Audio glitches or disruptions when both the participant and examiner are speaking at the same time could have caused interference. While HNRP staff outline best practices for TNP evaluations, the lack of control over standardized testing environments in the TNP is a notable limitation. Furthermore, examinees were not asked to silence their device notifications during testing. While examiners noted this as an interruption to testing, we cannot fully rule out the potential impact of this distraction on test results. Results suggest that interruptions to testing were associated with worse performance on the HVLT-R Total Recall, despite no specific information about when the interruption occurred during testing. Thus, it may be beneficial to ask examinees to disable their device notifications during testing and for examiners to include standardized information about interruptions during testing. Since examiners do not directly assess substance use at the TNP and we were unable to conduct a urine toxicology test, we cannot confirm that examines were toxicology negative. Considering there was small sample of Spanish-speaking PWH that were administered the COWAT (PMR), results of significant differences between in-person and TNP may be interpreted with caution. Furthermore, there are other aspects of the Spanish-speaking sample that may make their neuropsychological evaluation particularly complex including socio-demographic, cultural, linguistic factors, and familiarity with telehealth and neuropsychological testing (Marquine, Rivera Mindt, et al., Reference Marquine, Rivera Mindt, Umlauf, Suarez, Kamalyan, Morlett Paredes, Yassai-Gonzalez, Scott, Heaton, Diaz-Santos, Gooding, Artiola i Fortuny, Heaton and Cherner2021). Future research may need to analyze the feasibility among Spanish-speaking PWH separately. The current study represents a secondary analysis of data from each participant’s neuropsychological evaluations, not a randomized control trial to validate TNP assessment. The number of HNRP participants with a completed TNP evaluation is rapidly growing; therefore, follow-up analyses with a larger sample size and likely more statistical power will be beneficial. Future research may also investigate the role of emotional factors (e.g., depression, anxiety, financial instability, stress) on TNP performance during the COVID-19 pandemic, as well as whether performance on TNP assessments may adequately discriminate between impairment classifications.
Among our sample of PWH and HIV−, we provided evidence of test-retest reliability and performance-level comparability of our IPA and TNP. Considering the current COVID-19 pandemic, and possible additional pandemics in the future, there is an immediate need for reliable neuropsychological assessments that can be administered remotely, especially for PWH. TNP evaluation shows promise to improve access to neuropsychological services and maintain ongoing clinical research studies.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S1355617722000777
Acknowledgements
This work was supported by NIMH-funded research programs including the HIV Neurobehavioral Research Center (HNRC), supported by award P30MH062512, the California NeuroAIDS Tissue Network (CNTN), supported by award U24MH100928, and the Multi-Dimensional Successful Aging Among HIV-Infected Adults, supported by award R01MH099987. Support for this study also includes a NIDA-funded research program that includes the Translational Methamphetamine AIDS Research Center (TMARC), supported by award P50DA026306; an NIH-funded research program that includes the CNS HIV Anti-Retroviral Therapy Effects Research (CHARTER), supported by award HHSN271201000036C; and the NIMHD-funded study Mechanisms of Disparities in Adverse Neurocognitive Outcomes among Hispanics Aging with HIV, supported by award R01MD013502. Stipend support to MK is funded by NIAAA award T32AA013525 and NIA award F31AG074838. Stipend support to NS is funded by NIDA award T32DA031098. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The San Diego HIV Neurobehavioral Research Center [HNRC] group is affiliated with the University of California, San Diego, the Naval Hospital, San Diego, and the Veterans Affairs San Diego Healthcare System, and includes: Director: Robert K. Heaton, Ph.D., Co-Director: Igor Grant, M.D.; Associate Directors: Ronald J. Ellis, M.D., Ph.D., and Scott Letendre, M.D.; Center Manager: Jennifer Iudicello, Ph.D.; Donald Franklin, Jr.; Melanie Sherman, Cheryl Kelley; NeuroMedical Core: Ronald J. Ellis, M.D., Ph.D. (Director/NeuroMedical Unit Head), Scott Letendre, M.D. (Co-I./Laboratory Unit Head), Christine Fennema-Notestine, Ph.D., (Co-I./Neuroimaging Unit Head); Debra Rosario, M.P.H., Neurobehavioral & Psychiatry Core: David J. Moore, Ph.D. (Co-Director/Neurobehavioral Unit Head), Murray B. Stein, M.D. (Co-Director/Psychiatry Unit Head), Erin E. Morgan, Ph.D. (Co-I./Psychiatric Coordinator), Andrew H. Miller, Matthew Dawson, NeuroVirology & Biology Core: Sara Gianella Weibel, M.D. (Co-Director/NeuroVirology Unit Head), Sarah A. LaMere, D.V.M., Ph.D. (Associate Unit Head, NeuroVirology Unit), Cristian Achim, M.D., Ph.D. (Co-Director/Neurobiology Unit Head), Ana Sanchez, Ph.D. (Co-I./Neurobiology Unit), Adam Fields, Ph.D.(Associate Unit Head, Neurobiology Unit); Microbiome Core: Rob Knight, Ph.D. (Co-Director), Pieter Dorrestein, Ph.D. (Co-Director); Developmental Core: Scott Letendre, M.D. (Director), Ajay Bharti, M.D. (Co-I.), J. Allen McCutchan, M.D., Christine Fennema-Notestine, Ph.D.; Administrative Core: Robert K. Heaton, Ph.D. (Director/Coordinating Unit Head), Participant Accrual and Retention Unit: J. Hampton Atkinson, M.D. (Unit Head), Jennifer Marquie-Beck, M.P.H.; Data Management and Information Systems Unit: Ian Abramson, Ph.D. (Unit Head), Clint Cushman; Statistics Unit: Florin Vaida, Ph.D. (Unit Head), Anya Umlauf, M.S., Bin Tang, Ph.D.
Conflict of interest
The author(s) have nothing to disclose.