In short, “Telemedicine” (TM) is the delivery of healthcare services using technology while remaining at a physical distance (1). Video consultation (VC) is a form of TM, which allows healthcare providers to communicate with their patients over a real-time video connection (Reference Westra and Niessen2). VC holds great potential for patients to receive care in the comfort of their own home, resulting in several benefits for both the patient and the healthcare provider (Reference Bradbury, Patrick-Miller, Harris, Stevens, Egleston and Smith3). Despite its promise, the successful implementation of VC resulting into routine provision of care is scarce (Reference Huygens, Vermeulen, Friele, van Schayck, de Jong and de Witte4). Many initiatives appear not to be systematically embedded in care processes and risk becoming dormant once the initial funding to start up the initiative has ended.
In order to evaluate and validate the use of VC in health care, patient satisfaction with professional consultation via VC is regarded as an important driver. If patient satisfaction is well established, policy makers can be directed toward sustainable implementation. The literature on VC abounds with studies concerning patient satisfaction, generally suggesting favorable results concerning a decrease in patient-related expenditures while maintaining face-to-face interaction with a caregiver (Reference Kruse, Krowski, Rodriguez, Tran, Vela and Brooks5). The authors use a diverse range of questionnaires to measure patient satisfaction, resulting in heterogeneous data which make it difficult to compare and combine results. Cross-situational evidence is needed to support a stronger business case for policy makers. In other words, to determine patient satisfaction with VC, a valid and reliable tool which allows a consistent assessment is of importance.
Although reviews in literature have summarized the evaluation of available assessment tools to date (Reference Langbecker, Caffery, Gillespie and Smith6–Reference Mair, Haycox, May and Williams8), these studies focus on TM as a whole, not specifically on VC. Furthermore, available reviews focus on questionnaire development and validation studies only, without including other available studies on the evaluation of their measurement properties. In addition, a comprehensive overview of the evaluation of the quality of measurement properties is not available. Hence, an evidence-based recommendation in the selection of the most suitable questionnaire for patient satisfaction with VC is lacking.
In this systematic review, the COSMIN (Consensus-based Standards for the selection of health Measurements Instruments) methodology and guidelines are used to critically appraise and summarize the measurement properties of all available validated questionnaires that measure patient satisfaction with VC. The primary aim is to come to an evidence-based recommendation on the most suitable questionnaire for measuring patient satisfaction with VC which can be used across settings and by multi-disciplinary teams.
Methods
This systematic review has been reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocols (PRISMA-P) guidelines in combination with the protocol for systematic reviews of measurement properties recommended by the COSMIN panel. This study was registered with PROSPERO (CRD42016051841) (Reference Moher, Liberati, Tetzlaff, Altman and Group9).
Search Criteria
A systematic literature search was performed to provide an overview of questionnaires used to measure patient satisfaction regarding VC. Empirical studies reporting the development and validation of patient satisfaction questionnaires concerning VC were included. VC was classified as “any type of consultation facilitated or supported by using a real-time video connection between a healthcare provider and a patient.” Store-and-Forward connections, and studies that shared information asynchronously, were excluded from the analysis. In addition, studies reporting the use of video with the main purpose of remote patient monitoring were excluded as well.
Patient satisfaction was defined as “patients’ reported opinion regarding the use of VC in consultation in any medical setting.” Articles solely measuring healthcare providers' satisfaction were excluded, as were studies conducting a standard patient satisfaction questionnaire without any relation to VC technology. The search strategy consisted of taxonomic matching terms of VC, physician–patient relations, and patient satisfaction (Supplementary Table 1).
Selection of Articles
The PubMed database, Embase database, and Cochrane Library were searched for relevant peer-reviewed articles. The search was conducted on 2 August 2019. Conference proceedings and reviews were not considered eligible for inclusion. Reports that did not relate to patient satisfaction or conducting patient satisfaction questionnaires and studies investigating the internal validity or technological aspects of a VC system were excluded from the analysis. Two reviewers screened all reports on title and abstract according to the aforementioned criteria. Reports deemed “relevant,” “dubious,” or “unknown” were examined in full text. The reference lists of the reports assessed for eligibility were searched for other relevant reports. Grey literature was searched using Google Scholar. None of the reports were excluded based on language. In case of missing data, the Internet was searched and study authors were contacted directly.
Data Extraction
The following general study characteristics were extracted from all reports: the name of the questionnaire, year of publication, study location, study design, and the purpose of using VC. Data concerning the questionnaires measurements included: type(s) of validity and reliability assessment, participants included in validation assessment, sample size, and other characteristics related to the measurement properties. All necessary data were collected by EZB and EVH.
Assessment of Methodological Quality of Included Studies
To evaluate the methodological quality of the questionnaires, the COSMIN Risk of Bias checklist was used (Reference Terwee, Mokkink, Knol, Ostelo, Bouter and de Vet10). This checklist is a standardized tool to assess studies on measurement properties. It contains the assessment of several measurement properties on design aspects and statistical methods. For each study, two independent reviewers (EZB and EVH) assessed the methodological quality of items based on a four-point rating scale (inadequate, doubtful, adequate, and very good) (Reference Terwee, Mokkink, Knol, Ostelo, Bouter and de Vet10). The overall score is determined by the lowest rating of any item on the checklist. In case of disagreement, there was a discussion to reach consensus.
Assessment of the Measurement Property of a Questionnaire
The result of each study was rated independently by EZB and EVH according to the updated criteria for good measurement properties (Reference Prinsen, Mokkink, Bouter, Alonso, Patrick and de Vet11). The measurement property of a questionnaire was labeled as either being sufficient ( + ), insufficient (−), or indeterminate (?).
Evidence Synthesis and Generating Recommendations
A summary of the strength of the evidence for the measurement properties for each questionnaire, including an overview of measurement properties, is provided. The quality of the evidence is qualitatively summarized and graded using the modified GRADE approach labeling findings as high, moderate, low, or very low in evidence. The modified GRADE approach uses the following three factors to determine the quality of the evidence: the risk of bias, the inconsistency of the results, and indirectness (Reference Mokkink, de Vet, Prinsen, Patrick, Alonso and Bouter12).
Results
Study Selection
The systematic search identified 2,348 articles. Cross-reference search identified eighteen additional articles. After removing duplicates (n = 353), 1,995 articles remained and were screened for relevance on title and abstract. A total of 268 were eligible for full-text screening. A total of twelve articles described the development or description of measurement properties (Reference Bradbury, Patrick-Miller, Harris, Stevens, Egleston and Smith3;Reference Allen and Hayes13–Reference Yoder, McFall and Cancio23). Of the twelve articles included, ten different questionnaires were evaluated. Figure 1 illustrates the PRISMA flow chart of the process. An overview of the general characteristics of the included studies is reported in Table 1.
TMPQ, Telemedicine Perception Questionnaire; TSQ, Telemedicine Satisfaction Questionnaire; TSUQ, Telemedicine Satisfaction and Usefulness Questionnaire; TUQ, Telehealth Usability Questionnaire; GP, general practitioner.
Assessment of Methodological Quality of Included Studies
According to the COSMIN checklist, the methodological quality of the studies varied from “inadequate” (Reference Bradbury, Patrick-Miller, Harris, Stevens, Egleston and Smith3;Reference Allen and Hayes13;Reference Demiris, Speedie and Finkelstein15;Reference Otten, Birnie, Ranchor and van Langen20;Reference Yip, Mackenzie and Chan24) to “very good” (Reference Bakken, Grullon-Figueroa, Izquierdo, Lee, Morin and Palmas14;Reference Demiris, Speedie and Finkelstein15;Reference Mekhjian, Turner, Gailiun and McCain19). However, most studies achieved “inadequate” to “doubtful” scores (Reference Bradbury, Patrick-Miller, Harris, Stevens, Egleston and Smith3;Reference Allen and Hayes13;Reference Demiris, Speedie and Finkelstein15;Reference Otten, Birnie, Ranchor and van Langen20–Reference Fatehi, Gray, Russell and Paul25). Internal consistency was the main measurement property that was assessed. All studies used the Cronbach's α value to measure internal consistency (ranging from α >.76 to .93). Out of the twelve studies, three studies scored “very good” on internal consistency (Reference Bakken, Grullon-Figueroa, Izquierdo, Lee, Morin and Palmas14;Reference Demiris, Speedie and Finkelstein15;Reference Mekhjian, Turner, Gailiun and McCain19). Only a few studies examined reliability (Reference Demiris, Speedie and Finkelstein15;Reference Jahromi and Ahmadian18;Reference Yip, Mackenzie and Chan24) and structural validity (Reference Mekhjian, Turner, Gailiun and McCain19;Reference Yip, Mackenzie and Chan24).
Lower quality ratings were mostly caused by not assessing or describing the dimensionality of a questionnaire and not assessing internal consistency for every subscale of a questionnaire separately. An overview of the methodological quality of the included studies is reported in Supplementary Tables 2a and 2b.
Assessment of Measurement Properties and Evidence Synthesis
A summary of the strength of the evidence for the measurement properties for each questionnaire, including an overview of measurement properties, is reported in Table 2.
Plus sign (+), sufficient overall rating measurement property; question mark (?), indeterminate overall rating measurement property; NA, data not available. High indicates that we are very confident that the true measurement property lies close to that of the estimate of the measurement property; moderate, we are moderately confident in the measurement property estimate—the true measurement property is likely to be close to the estimate of the measurement property, but there is a possibility that it is substantially different; low, our confidence in the measurement property estimate is limited—the true measurement property may be substantially different from the estimate of the measurement property; and very low, we have very little confidence in the measurement property estimate—the true measurement property is likely to be substantially different from the estimate of the measurement property. The modified GRADE approach for was used to grade the overall quality of evidence (Reference Terwee, Bot, de Boer, van der Windt, Knol and Dekker26).
Telehealth Usability Questionnaire (TUQ)
The TUQ was developed by Parmanto et al. (Reference Parmanto, Lewis, Graham and Bertolet21) and is based on several reported questionnaires in literature. The multidimensional questionnaire contains twenty-one items. The TUQ was assessed in one study for internal consistency in which the Cronbach's α was high for every subscale measured. However, the quality of the study was rated “doubtful” because the population used in the study did not match the target group of the questionnaire. The TUQ received an indeterminate rating because there was not enough information available on the structural validity. None of the articles retrieved by this systematic review mentioned the TUQ as a reference; however, two studies outside the scope of this review report the use of the TUQ (Reference Schutte, Gales, Filippone, Saptono, Parmanto and McCue27;Reference Faett, Brienza, Geyer and Hoffman28).
Telemedicine Satisfaction and Usefulness Questionnaire (TSUQ)
The TSUQ developed in both English and Spanish was based on the TMPQ and includes fourteen items (Reference Bakken, Grullon-Figueroa, Izquierdo, Lee, Morin and Palmas14). The questionnaire was assessed on internal consistency and structural validity. Based on the COSMIN checklist, the risk of bias was rated as “very good” and a large sample size was used. Therefore, the quality of evidence for internal consistency was rated as “high.” Several studies mentioned the TSUQ as a reference (Reference Fatehi, Martin-Khan, Smith, Russell and Gray16;Reference Toledo, Triola, Ruppert and Siminerio29;Reference Vriezinga, Borghorst, van den Akker-van Marle, Benninga, George and Hendriks30).
Telemedicine Satisfaction Questionnaire (TSQ)
The development of TSQ was based on previous literature after which a panel of doctors, nurses, and experts in TM reviewed the questionnaire (Reference Otten, Birnie, Ranchor and van Langen20). Changes were made accordingly. The TSQ and the translated Dutch version were assessed by two separate studies. Although the authors researched four measurement properties, the risk of bias was assessed as “inadequate” to “doubtful” due to a small sample size and a lack of description of the methods used. The TSQ is mentioned by multiple studies in- and outside the scope of this systematic review (Reference Otten, Birnie, Ranchor and van Langen20;Reference Goulis, Giaglis, Boren, Lekka, Bontis and Balas31–Reference Thomas, Novins, Hosokawa, Olson, Hunter and Brent33).
Telemedicine Satisfaction and Usefulness Questionnaire (TMPQ)
The TMPQ was developed based on published literature as well as focus groups with patients (Reference Demiris, Speedie and Finkelstein15;Reference Yoder, McFall and Cancio23). The measurement properties were assessed in two studies. Both studies included small sample sizes and the risk of bias was assessed “inadequate” to “doubtful” mainly because of a lack of a description of the used methods. The TMPQ is mentioned by one other author within the scope of this systematic review (Reference Finkelstein, Speedie, Zhou, Potthoff and Ratner34).
Other
Several authors mentioned the lack of a validated questionnaire and therefore designed a specifically tailored questionnaire to use in their study (Reference Dham, Gupta, Alexander, Black, Rajji and Skinner35–Reference Hatton, Chandra, Lucius and Ciuchta37). All reported questionnaires were developed based on previously used items in the literature (Reference Allen and Hayes13;Reference Gattas, MacMillan, Meinecke, Loane and Wootton38;Reference Loane, Bloomer, Corbett, Eedy, Gore and Mathews39). The risk of bias was rated “inadequate” to “doubtful,” with the exception of the questionnaire developed by Mekhjian and et al. (Reference Mekhjian, Turner, Gailiun and McCain19). This research group designed a questionnaire achieving validity on internal consistency and included a large sample size. The study was therefore assessed as of high methodological quality concerning internal validity.
Recommendations for the Most Suitable Questionnaire to Measure Patient Satisfaction with VC
The study performed by Mekhjian et al. (Reference Mekhjian, Turner, Gailiun and McCain19) carries the best evidence for the validity of the measurement properties (Table 2). Although other authors have adapted this questionnaire after publication, new validity studies have not yet been published. The TSUQ scored high quality on internal validity but the overall rating was indeterminate. The quality of the hypotheses testing for construct validity was considered moderate. The TSUQ is also reported by other authors. The questionnaire used by Mekhjian et al. (Reference Mekhjian, Turner, Gailiun and McCain19) was designed for inmates of the Ohio prison. VC was used in order to prevent inmates from having to leave the prison to receive medical care. The received care was from multiple disciplines resulting in a more general focus on patient satisfaction with VC. Although the TSUQ scored second best, this questionnaire was specifically designed for diabetes care, making it more difficult to be used interdisciplinary.
Discussion
This systematic search retrieved twelve studies evaluating the measurement properties of ten different questionnaires to assess patient satisfaction with VC. A validated questionnaire that can be considered a gold standard when respecting the COSMIN criteria checklist was not identified to date.
According to the COSMIN checklist for validation, the methodological quality of most studies was identified as “inadequate” to “poor.” Main reasons for this were inadequate sample sizes and/or inadequate reporting of missing items. Evidence for the psychometric properties of the questionnaires was limited with few positive results on reliability and validity. The findings of this review are in agreement with previous systematic reviews, reporting the lack of validation (Reference Mair, Haycox, May and Williams8).
An explanation for the lack of validated questionnaires in literature might be that the use of VC in clinical practice varies across healthcare settings. This makes the development of a patient satisfaction questionnaire that can be used across settings complex. As a result, questionnaires are designed to acquire knowledge for a specific specialty and tailored for a specific situation or indication to ensure their relevance. Furthermore, it results in a potpourri of questionnaires for single-use, not contributing to a robust body of evidence. Indeed, it is important that questionnaires are externally validated across settings. Therefore, it is encouraging that the recommended questionnaire based on this systematic review seems to be a good fit for multiple disciplines because of its interdisciplinary use.
This study systematically summarizes available evidence on the measurement properties of various questionnaires developed to assess patient satisfaction with VC. The use of the standardized COSMIN methodology for critical appraisal of the methodological quality is important in assessing outcome and has not been done before. However, one may comment that the strictness of the COSMIN methodology makes it hard for development studies to obtain a sufficient score when study quality is perceived to be sufficient but not excellent. In addition, questionnaires developed before the COSMIN guidelines were published may have not reported their methodology sufficiently to be ranked properly. Hence, it must be noted from our findings that we cannot conclude that questionnaires scoring low on the COSMIN criteria are in fact “inadequate.” It merely demonstrates that the questionnaires have not been tested extensively or more importantly, that the methodology has not been reported accurately and findings cannot be assessed validly across situations. Furthermore, as Bagot et al. (Reference Bagot, Bladin and Cadilhac40) pointed out, the selection of keywords when submitting an empirical paper is of great importance. Due to the omission of or the improper use of relevant keywords, it is possible that relevant articles were not retrieved via our search strategy. Although our effort to retrieve additional relevant articles through cross-referencing, this might have resulted in an incomplete overview of all the patient satisfaction questionnaires in the current medical literature.
As a rapid increase of technology leads to a predominance of pilot and feasibility studies with often small sample sizes, validating a specific questionnaire is time consuming, costly, and may not be a priority. The adaptation of a well-designed questionnaire to a specific situation is an elegant and smart solution when both information and evidence is needed. In that case, careful consideration of which questionnaire items to use might facilitate the combination of outcomes for meta-analyses in order to generalize results and improve the quality of the outcome. Collaborative research efforts to jointly use and collect the same outcome measurements may provide progress in increasing the quality of the assessment of patient satisfaction with VC.
Apart from future studies on the measurement properties of the questionnaire of Mekhjian et al. (Reference Mekhjian, Turner, Gailiun and McCain19) and the TSUQ, a valuable suggestion is to define which domains and questionnaire items are important to measure patient satisfaction extensively. Garcia and Adelakun (Reference Garcia and Adelakun7) made the first attempt to identify contributing dimensions of patients’ satisfaction with VC. They proposed a framework that provides guidance on which generic dimensions should be included to measure patient satisfaction appropriately. However, examples of questionnaire items to be used were not provided. Consensus on these items could establish a standardized collection of outcomes and might counteract heterogeneity in outcome measurements. This could facilitate a large step toward a uniform assessment of patient satisfaction toward VC.
Conclusion and Recommendations
This systematic review indicates that high-quality studies on measurement properties of patient satisfaction questionnaires with VC are scarce, and the need for such an instrument appeared to be much requested by diverse authors. The study reported by Mekhjian scores highest on the aspect of methodological quality using the COSMIN Risk of Bias checklist and seems to have the most potential for future and cross-sectional use. More studies on the measurement properties could further consolidate these recommendations, as additional validity studies on the questionnaire of Mekhjian need to be published hereafter.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0266462320000367
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Conflict of Interest
The author(s) declare(s) that there is no conflict of interest.