Spatial neglect in the digital age: Influence of presentation format on patients’ test behavior

Hannah Rosenzopf; Christoph Sperber; Franz Wortha; Daniel Wiesen; Annika Muth; Elise Klein; Korbinian Möller; Hans-Otto Karnath

doi:10.1017/S1355617722000790

Spatial neglect in the digital age: Influence of presentation format on patients’ test behavior

Published online by Cambridge University Press: 28 October 2022

Korbinian Möller and

Hannah Rosenzopf: Affiliation:
Centre of Neurology, Division of Neuropsychology, Hertie-Institute for Clinical Brain Research, University of Tuebingen, Tuebingen, Germany
Christoph Sperber: Affiliation:
Centre of Neurology, Division of Neuropsychology, Hertie-Institute for Clinical Brain Research, University of Tuebingen, Tuebingen, Germany
Franz Wortha: Affiliation:
Department of Psychology, University of Greifswald, Greifswald, Germany
Daniel Wiesen: Affiliation:
Centre of Neurology, Division of Neuropsychology, Hertie-Institute for Clinical Brain Research, University of Tuebingen, Tuebingen, Germany
Annika Muth: Affiliation:
Centre of Neurology, Division of Neuropsychology, Hertie-Institute for Clinical Brain Research, University of Tuebingen, Tuebingen, Germany
Elise Klein: Affiliation:
University of Paris, LaPsyDÉ, CNRS, Sorbonne Paris Cité, Paris, France Leibniz Institut für Wissensmedien, Tuebingen, Germany
Korbinian Möller: Affiliation:
Leibniz Institut für Wissensmedien, Tuebingen, Germany Centre for Mathematical Cognition, School of Science, Loughborough University, Loughborough, United Kingdom Centre for Individual Development and Adaptive Education of Children at Risk (IDeA), Frankfurt, Germany
Hans-Otto Karnath*: Affiliation:
Centre of Neurology, Division of Neuropsychology, Hertie-Institute for Clinical Brain Research, University of Tuebingen, Tuebingen, Germany Department of Psychology, University of South Carolina, Columbia, SC, USA
*: Corresponding author: Hans-Otto Karnath, email: [email protected]

Article contents

Abstract
Objective:
Method:
Results:
Conclusion:
Introduction
Method
Results
Discussion
Funding statement
Conflicts of Interest
References

Rights & Permissions

Abstract

Objective:

Computerized neglect tests could significantly deepen our disorder-specific knowledge by effortlessly providing additional behavioral markers that are hardly or not extractable from existing paper-and-pencil versions. This study investigated how testing format (paper versus digital), and screen size (small, medium, large) affect the Center of cancelation (CoC) in right-hemispheric stroke patients in the Letters and the Bells cancelation task. Our second objective was to determine whether a machine learning approach could reliably classify patients with and without neglect based on their search speed, search distance, and search strategy.

Method:

We compared the CoC measure of right hemisphere stroke patients with neglect in two cancelation tasks across different formats and display sizes. In addition, we evaluated whether three additional parameters of search behavior that became available through digitization are neglect-specific behavioral markers.

Results:

Patients’ CoC was not affected by test format or screen size. Additional search parameters demonstrated lower search speed, increased search distance, and a more strategic search for neglect patients than for control patients without neglect.

Conclusion:

The CoC seems robust to both test digitization and display size adaptations. Machine learning classification based on the additional variables derived from computerized tests succeeded in distinguishing stroke patients with spatial neglect from those without. The investigated additional variables have the potential to aid in neglect diagnosis, in particular when the CoC cannot be validly assessed (e.g., when the test is not performed to completion).

Keywords

diagnostics test digitalization center of cancelation right hemisphere stroke Human

Type: Research Article
Information: Journal of the International Neuropsychological Society , Volume 29 , Issue 7 , August 2023 , pp. 686 - 695

DOI: https://doi.org/10.1017/S1355617722000790 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright: Copyright © INS. Published by Cambridge University Press, 2022

Introduction

Spatial neglect is a common result of unilateral, predominantly right-hemispheric brain damage (Becker & Karnath, Reference Becker and Karnath2007). Its core symptoms include an egocentric bias in gaze direction and exploration towards the ipsilesional side (Corbetta & Shulman, Reference Corbetta and Shulman2011; Karnath & Rorden, Reference Karnath and Rorden2012). One type of test to detect and quantify these symptoms are cancelation tasks (Weintraub & Mesulam, Reference Weintraub, Mesulam and Mesulam1985; Gauthier, Dehaut & Joanette, Reference Gauthier, Dehaut and Joanette1989; Ferber & Karnath, Reference Ferber and Karnath2001). They are commonly presented on sheets of paper placed in front of the patient, who is required to find and manually mark all targets among distractors. Patients with spatial neglect often miss targets on the contralesional side. The presence and severity of spatial neglect can be measured by the Center of Cancelation (CoC, Rorden & Karnath, Reference Rorden and Karnath2010) which assesses the average position of correctly marked targets with respect to the patient’s ego center.

While paper-and-pencil-based cancelation tasks can be a time-efficient yet reliable diagnostic alternative to more extensive test batteries (Fullerton, Stout, & McSherry, Reference Fullerton, Stout and McSherry1986; Ferber & Karnath, Reference Ferber and Karnath2001), they provide only part of the information they could if they were computer-based (Schendel & Robertson, Reference Schendel and Robertson2002; Bonato & Deouell, Reference Bonato and Deouell2013, Dalmaijer, Van der Stigchel, Nijboer, Cornelissen, & Husain, Reference Dalmaijer, Van der Stigchel, Nijboer, Cornelissen and Husain2015). Among other aspects, digitization can provide additional variables such as response time, revisits (of already marked items), and information concerning the search path applied (Donnelly et al., Reference Donnelly, Guest, Fairhurst, Potter, Deighton and Patel1999; Dalmaijer et al., Reference Dalmaijer, Van der Stigchel, Nijboer, Cornelissen and Husain2015).

However, due to the lack of comparison with traditional, validated paper-and-pencil versions, it cannot yet be excluded that variations in test format may lead to results that differ from those of traditional paper-and-pencil versions. Furthermore, in clinical practice, traditional A4 paper-and-pencil tests will likely be implemented as scaled-down versions matching commonly used tablet sizes. However, the effect of using devices of different sizes on the validity of the tests has not yet been sufficiently studied in the context of cancelation tasks. Concerning line bisection, another means used to diagnose neglect, previous observations have suggested that the length of the bisected line may have some influence on spatial attentional processing (Bowers & Heilman, Reference Bowers and Heilman1980; McCourt & Jewell, Reference McCourt and Jewell1999; Anderson, Reference Anderson1997). On the other hand, studies in neurological patients have suggested that a change in frame size, that is the size of the space searched by the patient, does not necessarily affect neglect-specific impairments. Body-centered (egocentric) and object-centered (allocentric) neglect appeared to dynamically adapt to different frame sizes, (Karnath & Niemeier, Reference Karnath and Niemeier2002; Baylis, Baylis, & Gore, Reference Baylis, Baylis and Gore2004; Karnath, Mandler & Clavagnier, Reference Karnath, Mandler and Clavagnier2011; Li, Karnath & Rorden, Reference Li, Karnath and Rorden2014).

In the present study, we compared right hemisphere stroke patients’ performance in cancelation tasks across different formats (paper-and-pencil vs. digital) and display sizes (small, medium, large) to investigate whether digitization of traditional cancelation tasks to various screen sizes affects their validity. As new variables become available through digitization, a further objective was to evaluate their contribution to diagnostic decisions. This is important because in clinical practice patients not always can complete a cancelation task (e.g., because they are too exhausted or because testing must be interrupted due to other clinical necessities). While measuring the CoC requires running the test to completion, other behavioral variables might become extractable already early on and thus aid diagnosis (if a test cannot be completed), given that these parameters proved to detect neglect-specific behavior. Based on previous observations on neglect patients’ visual coordination (Karnath & Huber, Reference Karnath and Huber1992; Donnelly et al., Reference Donnelly, Guest, Fairhurst, Potter, Deighton and Patel1999; Ptak, Golay, Müri, & Schnider, Reference Ptak, Golay, Müri and Schnider2009; Machner et al., Reference Machner, Dorr, Sprenger, von der Gablentz, Heide, Barth and Helmchen2012; Kaufmann et al., Reference Kaufmann, Cazzoli, Pflugshaupt, Bohlhalter, Vanbellingen, Müri, Nef and Nyffeler2020) we investigated parameters search speed (number of targets found relative to time), search distance (the mean distance between two consecutive targets), and search strategy (a calculation of search path) for their ability to predict spatial neglect.

Methodological investigations have shown that effects revealed by statistical analyses often have limited informative value in (applied) diagnostic contexts, even when effect sizes are very large (Dwyer, Falkai & Koutsouleris, Reference Dwyer, Falkai and Koutsouleris2018). Due to their strong focus on generalization and prediction of unknown data, machine learning approaches appear to be more suitable in most diagnostic applications than most statistical modeling approaches (Dwyer, Falkai & Koutsouleris, Reference Dwyer, Falkai and Koutsouleris2018). The specific use of machine learning models in diagnostic processes can vary, ranging from automatic evaluations of diagnostic tasks (Chen et al., Reference Chen, Stromer, Alabdalrahim, Schwab, Weih and Maier2020) to interpretable classifications that outperform traditional paper-based tests in the prediction of neuropsychiatric disorders (Souillard-Mandar et al., Reference Souillard-Mandar, Penney, Schaible, Pascual-Leone, Au and Davis2021). Accordingly, in the present investigation, we tested the potential diagnostic value of process parameters obtained from digital cancelation tests using such approaches.

Method

Subjects

Nineteen continuously admitted acute right hemisphere stroke patients (N = 8 without spatial neglect; N = 11 suffering from spatial neglect) and one chronic neglect patient who returned for a follow-up neuropsychological investigation were recruited at the Centre of Neurology at Tuebingen University. Structural imaging was acquired by computed tomography as part of the clinical routine conducted for all stroke patients at admission except for one patient who received magnetic resonance imaging instead. Patients with diffuse or bilateral brain lesions, patients with tumors, and patients without obvious lesions were not included. According to the routine clinical neurological examination, patients did not suffer from any further neurological pathologies. Clinical and demographic variables of the two patient groups are summarized in Table 1; Figure 1 illustrates an overlap plot of their brain lesions. The study was performed in accordance with the revised Declaration of Helsinki, the local ethics committee approved the study and all patients provided their written consent to participate.

Figure 1. Lesion overlays. Overlay of the normalized lesions of right hemisphere patient groups with and without spatial neglect. Lesion boundaries were semi-automatically using the Clusterize algorithm on the SPM Clusterize toolbox (cf. De Haan et al., Reference De Haan, Clas, Juenger, Wilke and Karnath2015) on SPM 12 (http://www.fil.ion.ucl.ac.uk/spm). Normalization of CT or MR scans to MNI space with 1x1x1 mm resolution was performed by using the Clinical Toolbox (Rorden, Hjaltason et al., Reference Rorden, Hjaltason, Fillmore, Fridriksson, Kjartansson, Magnusdottir and Karnath2012) under SPM12, and by registering lesions to its age-specific MR or CT templates oriented in MNI space (Rorden, Bonilha et al., Reference Rorden, Bonilha, Fridriksson, Bender and Karnath2012).

Table 1. Demographic and clinical data of the 20 right-hemispheric patients with and without spatial neglect included in the study. Mean, standard deviation

* derived from initial diagnostics;

** pooled from digital versions (data was not evident from paper-and-pencil versions).

All patients were clinically examined with a bedside neglect screening upon admission to the Centre of Neurology. This screening determined patients’ allocation to the neglect group or the control group. The 19 acute stroke patients were tested on average 6.4 days (SD = 4.5) post-stroke; the chronic neglect patient was tested 32 months post-stroke. The screening included two cancelation tasks (Bells test [Gauthier et al., Reference Gauthier, Dehaut and Joanette1989]; Letters test [Weintraub & Mesulam, Reference Weintraub, Mesulam and Mesulam1985]), and a copying task (Johannsen & Karnath, Reference Johannsen and Karnath2004). These tasks were presented on a DIN A4 sized 297 by 210 mm paper each. We calculated the CoC using the procedure and cut-off scores for neglect diagnosis by Rorden and Karnath (Reference Rorden and Karnath2010) for both the Letters (cut-off: −/+ 0.083) and Bells test (cut-off: −/+ 0.081). The CoC is a sensitive measure capturing both number and location of omissions, with zero representing an equal distribution of correctly identified stimuli along the x-axis of the test sheet. Negative deviations (with a maximum of −1) indicate a bias to the left side of the test sheet. Positive deviations (with a maximum of 1) indicate a bias to the right side of the test sheet. The copying task requires patients to copy a complex multi-object scene consisting of four figures (a fence, a car, a house, and a tree), with two of them located in each half of the horizontally oriented sheet of paper. Omission of at least one of the contralesional features of each figure was scored as 1, and omission of each whole figure was scored as 2. One additional point was given when contralesionally located figures were drawn on the ipsilesional side of the paper sheet. The maximum score was 8. A score higher than 1 (i.e., > 12.5% omissions) was taken to indicate neglect. The duration of each test depended on the patient being satisfied with his/her performance and confirming this twice. Spatial neglect was diagnosed if patients scored within the pathological ranges of at least 2 out of 3 tests (see. Tab. 1).

Material and procedure

The experiment included the same cancelation tasks as the clinical assessment, that is, the Bells and Letters test, presented on A4 sheets of paper. Beyond, the experiment comprised computerized touch screen versions of said cancelation tasks. Computerized testing was performed on a capacitive 27-inch multi-touch display (3M – M2767PW), connected to a laptop (HP ProBook 4740s with Windows 7 Professional). The touchscreen versions of the two tasks were custom created using MATLAB 2016b and Psychtoolbox (https://doi.org/10.17632/6dzxs69j7d.1). Computerized cancelation tests (touch screen – TS) were high-resolution versions of the original test images used for the paper-and-pencil version displayed in three different sizes: 260.28 mm × 173.52 mm (“TS small”; a tablet size as e.g., in Microsoft surface, HP Elite, Dell Latitude 5290), 297 mm × 210 mm (“TS medium”; equivalent to an A4 paper), and 597.6 mm × 336.2 mm (“TS large”; full-screen size of the 27-inch touch screen). The small and medium versions were displayed centrally on the 27-inch display, with a black margin between the end of the test and the end of the screen. Despite the different sizes in the respective conditions test coordinates were always measured with a relative distance from center to borders between −1 and 1, −1/−1 representing the upper left corner. To keep paper-and-pencil and touchscreen conditions as comparable as possible, the touchscreen lay flat on the table and a touchscreen compatible pen (Adonit Dash 2) was used to mark the targets. Patients’ marks were visualized in real-time, providing patients with visual feedback comparable to that provided by conventional pens on a regular sheet of paper. Due to their health issues, four patients were unable to complete all trials, which led to 9 missing data sets in different test conditions. Said patients had to be excluded from parts of the analyses.

In the experiment, half of the participants started with the paper-and-pencil version of the two cancelation tasks, the other half with the touchpad versions. The order of the two paper-and-pencil versions was alternated, the order of the 6 different touchpad versions was randomized. Participants were instructed to find all the bells/”A”s that were spread among distractors and to tell the experimenter once they were done. Before starting the next trial, patients confirmed that they were indeed done with this trial, that is, could not find any other target stimuli.

Data analysis

For comparing right hemisphere stroke patients’ CoC performance in cancelation tasks across different formats (paper-and-pencil vs. digital) and display sizes (small, medium, large), we used Wilcoxon and Friedman tests respectively. To measure (1) search speed, we extracted a participant’s total number of correctly identified items and divided it by the time measured between starting the test and marking the last item to assess the number of targets found relative to time (measured in seconds). For (2) search distance we averaged the Euclidean distance between every two targets found in direct succession to each other. While search distance was defined as Euclidean distance, a high degree of (3) search strategy, was defined by a pattern that keeps either the steps along the (assumed) x- or y-axis low and subsequently results in a row (a low distance on the y-axis) or column-wise (a low distance on the x-axis) search (for an illustration of the distinction see Figure 2). Both distances were averaged across all found targets. A strategic search, as we define it here, should result in low values in either the mean x-axis distance or the mean y-axis distance. Low y values indicate a row-wise left-to-right (reading-like; Figure 3A) or alternating left-to-right and right-to-left (Figure 3B) search pattern; low x values indicate a column-wise top-to-bottom (Figure 3C) or alternating top-to-bottom and bottom-to-top (Figure 3D) search pattern. The measure is independent of direction and applies also if tests were started from the right or the bottom. To investigate potential differences between (i) the digital screen sizes and (ii) right-hemispheric patients with spatial neglect in comparison to patients without neglect, we applied a 2 × 3 analyses of variance for each of the three parameters above (i.e., search speed, search distance, and search strategy) with the between-subjects factor group (neglect vs. no neglect) and the within-subjects factor screen size (TS small vs. TS medium vs. TS large).

Figure 2. Distinction between distance and strategy. Distance was defined as the Euclidean distance between two consecutive targets. Search strategy in our case assumes that the more strategic the search is, the more it follows a row or column-wise search pattern, manifesting in small distances along the y- or x-axis, respectively. While it is possible that the target with the lowest Euclidean distance is also the most strategic one this is not necessarily the case. E.g. from position A target B minimizes both Euclidean distance and the distance along the y-axis. From position B, on the other hand, the most strategic step (i.e. minimizing y-distance as before) is target C, while the target that is overall the closest (and therefore minimizing Euclidean distance) is target D.

Figure 3. Measured search strategies. Search strategies covered by variable “search strategy” in the present study: (A) left-to-right (reading-like) strategy; (B) alternating left-to-right and right-to-left search pattern; (C) column-wise top-to-bottom strategy; (D) alternating top-to-bottom and bottom-to-top search pattern.

To finally analyze if the three parameters above can be used to reliably predict participants’ neglect diagnosis (dichotomized: spatial neglect vs. no neglect), we used Support Vector Machines (SVM). Given that SVM require complete data sets, we first used Multiple Imputation by Chained Equations (MICE; White, Royston, & Wood, Reference White, Royston and Wood2011) to impute missing data for this analysis step only. It entailed that missing values in a given column were estimated using a Bayesian Ridge Regressor, predicting values of the current column from all other columns. MICE was carried out column-wise from the column with the least number of missing values to the column with the most missing values. The potential impact of the imputation was tested by rerunning all analyses with a dataset where missing values were omitted. In the following sections, only results for the imputed dataset will be reported, because the pattern of results remained identical with and without imputation. Due to the sample size of the present study, we decided to use a dataset containing all screen sizes (TS small, TS medium, TS large) for each participant. To account for the dependence of data points in this approach (i.e., three measures for each participant), we tested our models through Leave-One-Subject-Out Cross-Validation. In this procedure, the machine learning model is trained on data for all participants but one and tested on the participant that was left out for training. This process is then repeated until each participant was predicted once and prediction outcomes (i.e., balanced accuracy due to the unequal group sizes; Brodersen, Ong, Stephan & Buhmann, Reference Brodersen, Ong, Stephan and Buhmann2010) are averaged across predictions for all participants. Hyperparameters (i.e., the kernel: linear or radial basis function; cost parameter: ranging from 0.01 to 10) were optimized through a grid search in a nested Leave-One-Subject-Out Cross-Validation (within the training dataset). This procedure was carried out separately for each task (Bells and Letters test) and balanced accuracy scores were obtained across all screen sizes for both tasks. Lastly, the percent of correctly classified neglect and right-hemispheric control patients were accumulated for each screen size and test. To test if the classification accuracy varied by screen size, chi-square tests of independence were used to compare the distribution of correctly classified neglect and right-hemispheric control patients across screen sizes for each task (Bells and Letters test). All machine learning analyses were conducted in Python using the scikit learn module (Pedregosa et al., Reference Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel, Blondel, Prettenhofer, Weiss, Dubourg, Vanderplas, Passos, Cournapeau, Brucher, Perrot, Duchesnay and Louppe2011).

Results

Comparison between paper-and-pencil and digital formats

To investigate whether digital versus paper-and-pencil test format has an impact on patients’ performance in cancelation tasks, we compared patients’ mean CoC scores in the A4 paper-and-pencil version to those in the same size in the digital A4 touch screen version (TS medium). Data are illustrated in Figure 4. Wilcoxon tests indicated no significant median CoC differences between the digital and the paper-and-pencil versions, neither in the Letters test (Z = 1.784, p = 0.072) nor in the Bells test (Z = 0.533, p = 0.594). In clinical practice, the traditional A4 paper-and-pencil tests will most likely be implemented as a downscaled version to match the currently used tablet size. Thus, we also investigated (cf. Figure 4) whether differences in performance arise between the established A4 paper-and-pencil version and the digital downsized tablet size (TS small). Again, we did not find significant differences for neither the Letters test (Z = 1.784, p = 0.074) nor the Bells test (Z = −0.356, p = 0.722).

Figure 4. CoC in paper and pencil vs. digital versions. Neglect patients’ performance for the Bells test and the Letter cancellation test in the traditional A4 paper-and-pencil version compared to the digital format with an equivalent touchscreen size (TS medium) as well as to the digital format with the smaller size that corresponds to a current tablet format (TS small). The bold lines represent mean values (including error bars) averaged over all patients.

Comparison between different sizes of the digital format

Center of cancelation

To investigate whether size variation between the digital versions affects cancelation performances, we used the CoC as dependent variable and test size (TS small vs. TS medium vs. TS large) as independent variable. Data are illustrated in Figure 5. Friedman tests revealed no significant results for the Letters test (χ² F(2) = 1.750, p = 0.417). The Bells test (χ² F(2) = 6.00, p = 0.050) was right at the border to significance. We, therefore, applied post hoc Wilcoxon comparisons to rule out significant differences. Indeed all three were found to be non-significant.

Figure 5. CoC scores over different digital test sizes. Neglect patients’ performances in the three different touch screen formats of the digitalized Bells test and Letters cancellation test. Test scores of each patient were connected with a line (patients who failed to complete the medium test size version were indicated by a broken line). The bold lines represent mean values (including standard error) averaged over all patients.

Additional parameters of search behavior

Beyond the CoC, the additional variables, search speed, search distance, and search strategy were obtained from the digitized cancelation tasks.

Search speed

Data are illustrated in Figure 6. Analysis of the Bells test revealed a significant main effect of group (F(1,15) = 6.719, p = 0.02, η _p ² = 0.309), indicating that control patients found significantly more targets per second (M = 0.196, SD = 0.057) than neglect patients (M = 0.124, SD = 0.066). The main effect of screen size, on the other hand, was not significant (F(2,30) = 0.235, p = 0.792), indicating that a comparable number of targets was found per second in all three screen sizes. The interaction was not significant either (F(2,30) = 1.983, p = 0.155). The same analysis applied on the Letters test also revealed a significant main effect of group (F(1,13) = 8.624, p = 0.012 η _p ² = 0.399), indicating again that control patients on average found more targets per second (M = 0.278, SD = 0.090) than neglect patients (M = 0.140, SD = 0.091). The main effect of screen size was significant as well (F(2,26) = 5.219, p = 0.012); the interaction was not significant (F(2,26) = 0.176, p = .839). According to post hoc comparisons (Fisher’s Least Significant Difference), more targets per second were found in condition TS large (M = 0.235, SD = 0.120) than in condition TS small (M = 0.183, SD = 0.066, p < 0.05, d = 0.537).

Figure 6. Additional search parameters over different test sizes. Averaged mean distance between to targets found in direct succession (left panel), averaged number of targets identified per time (middle panel), and averaged mean distance between two successive targets (right panel) (including standard error) in the three different touch screen formats of the digitalized cancellation tests in patients with and without spatial neglect (Neglect, No Neglect).

Search distance

Data are illustrated in Figure 6. Analysis of the Bells test revealed no main effect of screen size (F(2,30) = 0.250, p = 0.781) and interaction (F(2,30) = 3.109, p = 0.059), but a main effect of group (F(1,15) = 7.357, p = 0.016, η _p ² = 0.329). Apparently search distance was smaller for control patients (M = 0.550, SD = 0.093) than for neglect patients (M = 0.674, SD = 0.093). For the Letters test, there was both a main effect of group (F(1,13) = 5.486, p = 0.036, η _p ² = 0.292) and of screen size (F(2,26) = 5.528, p = 0.010). The interaction was not significant (F(2,26) = 0.244, p = 0.785). Again, search distance was smaller for control patients (M = 0.452, SD = 0.116) than for neglect patients (M = 0.594, SD = 0.115). Post hoc comparisons indicated that in the TS large version (M = 0.474, SD = 0.120) items found in direct succession were on average closer to one another than in the TS small (M = 0.552 SD = 0.143, p = 0.026, d = 0.590) and TS medium (M = 0.543, SD = 0.127, p = 0.018, d = 0.557) versions.

Search strategy

Data are illustrated in Figure 6. Analysis of this parameter for the Bells test did neither show a main effect of screen size (F(2,30) = 2.503, p = 0.099) nor an interaction (F(2,30) = 0.002, p = 0.998), while the main effect of group was significant F(1,22) = 9.11, p < 0.01, η _p ² = 0.502). Control patients scored significantly higher (M = 0.472, SD = 0.119) than neglect patients (M = 0.251, SD = 0.117), indicating that search behavior of neglect patients was more strategic than the one of control patients. Results concerning the Letters test uncovered a main effect for group (F(1,13) = 15.787, p = .002 η _p ² = 0.548), indicating that neglect patients search behavior was significantly more strategic (M = 0.180, SD = 0.51) than the one of right-hemispheric control patients (M = 0.386, SD = 0.101). There was neither a main effect of screen size (F(2,26) = 0.513, p = 0.604) nor an interaction (F(2,26) = 0.1.177, p = 0.324).

Prediction of spatial neglect by the additional parameters of search behavior

To determine if the three additional parameters of search behavior provided by the digital format can be used to differentiate between right-hemispheric patients with and without spatial neglect, SVM were used. First, the binary diagnosis (neglect vs. no neglect) was predicted separately for the Bells and the Letters tests across all screen sizes, using Leave-One-Subject-Out Cross-Validation. Results showed that this cross-participant classification across screen sizes was highly accurate for the Bells test and for the Letters test with average balanced accuracy scores of 97.92% and 88.19%, respectively. The training and test accuracies for all models are shown in Figure 7. Second, chi-square tests of independence indicated that the frequency of accurately predicted neglect and right-hemispheric control patients (see Table 2) was independent of the screen size for the Bells (χ²(2) = 0.05, p = 0.973) and Letters test (χ²(2) = 0.18, p = 0.914). To investigate if the machine learning models solely predict neglect diagnosis as a proxy for lesion size, we tested if the models could accurately differentiate if a participant had an above or below average lesion volume compared to the sample. Results showed low accuracy for models trained on both tests (Bells test 64.29%, Letter test (54.76%) indicating that predictions were largely made independent of lesion volumes.

Figure 7. Machine learning model performance. Training and test performance of the machine learning models classifying the binary diagnosis “neglect vs. no neglect” in the right hemispheric patient sample overall and broken down by screen size (TS small, TS medium, and TS large).

Table 2. Frequency of correctly classified neglect and right-hemispheric control patients by screen size (TS small, TS medium, and TS large)

Discussion

Paper-and-pencil versus digital test version

Several papers have acknowledged numerous perks of digitizing neuropsychological assessments in general (Bauer et al., Reference Bauer, Iverson, Cernich, Binder, Ruff and Naugle2012; Germine, Reinecke, & Chaytor, Reference Germine, Reinecke and Chaytor2019) and neglect diagnostics specifically (Donnelly et al., Reference Donnelly, Guest, Fairhurst, Potter, Deighton and Patel1999; Bonato, Priftis, Marenzi, Umiltà, & Zorzi, Reference Bonato, Priftis, Marenzi, Umiltà and Zorzi2012; Bonato & Deouell, Reference Bonato and Deouell2013). This has inspired the introduction of novel computer-based neglect assessments (Donnelly et al., Reference Donnelly, Guest, Fairhurst, Potter, Deighton and Patel1999; Deouell, Sacher & Soroker, Reference Deouell, Sacher and Soroker2005; Bar-Haim, Kizony, Shahar, & Katz, Reference Bar-Haim Erez, Kizony, Shahar and Katz2006; List et al., Reference List, Brooks, Esterman, Flevaris, Landau, Bowman, Stanton, Vanvleet, Robertson and Schendel2008; Bonato, Priftis, Umiltà, & Zorzi, Reference Bonato, Priftis, Umiltà and Zorzi2013; Dalmaijer et al., Reference Dalmaijer, Van der Stigchel, Nijboer, Cornelissen and Husain2015; Villarreal et al., Reference Villarreal, Linnavuo, Sepponen, Vuori, Jokinen and Hietanen2020). Digital versions have been argued to be more flexible, allowing to create several parallel versions of a specific task and therefore preventing learning effects from numerous repetitions of one identical version, for example, in the course of rehabilitation (Bonato & Deouell, Reference Bonato and Deouell2013). They can further be created to be immediately adaptive to patients’ individual performance (List et al., Reference List, Brooks, Esterman, Flevaris, Landau, Bowman, Stanton, Vanvleet, Robertson and Schendel2008). Moreover, digital formats could further increase a test’s sensitivity by increasing the amount of information extractable from its data (Bonato & Deouell, Reference Bonato and Deouell2013, Dalmaijer et al., Reference Dalmaijer, Van der Stigchel, Nijboer, Cornelissen and Husain2015). However, previous studies also stressed the importance of validating digital formats (Bauer et al., Reference Bauer, Iverson, Cernich, Binder, Ruff and Naugle2012; Germine et al., Reference Germine, Reinecke and Chaytor2019). The present paper is to our knowledge the first that systematically compared patients’ performances between digital and analogous formats. Stroke patients’ CoC derived from cancelation tasks seems robust to test digitization. Thus, it seems safe to introduce digitized diagnostic measures (at least in the scope of size variations as investigated in the present study) and keep the existing cut-off scores, without having to fear distortions in the CoC and related diagnostic decisions.