Introduction
Patient experience is increasingly an important part of the assessment of the quality of oncology care. This includes the evaluation of patients’ perception of their experience through the Consumer Assessment of Healthcare Providers and Systems (CAHPS) Cancer Care Survey (Evensen et al. Reference Evensen, Yost and Keller2019) and interest in patient-reported outcome (PRO)–based performance measures (PRO-PMs) (Stover et al. Reference Stover, Urick and Deal2020). A major challenge is that patients may be unable to self-report, and thus a proxy may be required. Previous research has demonstrated reasonable levels of concordance between paired proxy and patient reports for PROs (Roydhouse and Wilson Reference Roydhouse and Wilson2017; Sneeuw et al. Reference Sneeuw, Sprangers and Aaronson2002; Tang and McCorkle Reference Tang and McCorkle2002), although proxies reported worse patient health-related quality of life than patients (Roydhouse and Wilson Reference Roydhouse and Wilson2017; Sneeuw et al. Reference Sneeuw, Sprangers and Aaronson2002; Tang and McCorkle Reference Tang and McCorkle2002). Interestingly, paired concordance studies have shown that proxies report better care satisfaction (Castle Reference Castle2005; Gasquet et al. Reference Gasquet, Dehe and Gaudebout2003) and quality (Giovannetti et al. Reference Giovannetti, Reider and Wolff2013) compared to patients.
Understanding the potential impact of proxy reports is important for fair comparisons across oncology practices. Previous research has demonstrated an association between poorer patient-perceived physician communication and unmet patient needs for symptom management (Walling et al. Reference Walling, Keating and Kahn2016). Furthermore, symptom burden is central to PRO-PMs (Stover et al. Reference Stover, Urick and Deal2019a, Reference Stover, Urick and Deal2020) and may inform comparisons of oncology practices (Stover et al. Reference Stover, Urick and Deal2019a).
The objective of this study, which is a secondary data analysis, was to investigate the potential impact of proxy reports for the assessment of care quality and experience in cancer. The specific aims of this study were (1) to compare perceptions of care quality and experience and perceptions of symptom management and burden by respondent type (patient/proxy); (2) assess the concordance of patient and proxy reporting of symptom management and care quality and experience; (3) estimate the percentage of respondents giving the top score for each respondent type, within subgroups defined by symptom management/distress and for practice types. Earlier work using data from the original study using patient-reported problem scores suggested there was greater room for improvement in comprehensive care settings compared to private practice (Teno et al. Reference Teno, Lima and Lyons2009). As patient-reported experience can be used to differentiate between sites, we used practice type as a subgroup to examine the potential impact of proxy reports for such a comparison.
For experience outcomes, we focused on communication because of previous findings showing an association with symptom management (Walling et al. Reference Walling, Keating and Kahn2016), as well as perception of provider help with symptom management. For all outcomes, following CAHPS, we focused on the highest scores because experience outcomes are often skewed (Havyer et al. Reference Havyer, van Ryn and Wilson2019), with most respondents selecting the top/highest values (Kemp et al. Reference Kemp, Ahmed and Quan2018; Takeshita et al. Reference Takeshita, Wang and Loren2020). We hypothesized that scores for quality and experience would be high regardless of respondent type. Additionally, given interest in PRO-PMs for systemic therapy (Stover et al. Reference Stover, Urick and Deal2020), we focused on the group of patients for whom both patients and proxies had received chemotherapy (either alone or with radiotherapy), as reported by both patients and proxies.
Methods
Data source and study design
This was a retrospective, secondary data analysis of an existing study. Because only de-identified data were used, the Brown University Institutional Review Board (IRB) deemed this study not human subjects research and IRB approval was not required. The original study, which was approved by the Brown University IRB, was a prospective cohort study that recruited patients diagnosed with advanced cancer in New Hampshire, Connecticut, and Rhode Island (Teno et al. Reference Teno, Lima and Lyons2009). Data were collected from 2006 to 2007 at 2 time points: after initial diagnosis or progression to advanced disease and after receiving treatment (Teno et al. Reference Teno, Lima and Lyons2009). There were 119 patients (of 206 recruited) (Teno et al. Reference Teno, Lima and Lyons2009) who had caregivers willing to complete questionnaires at time 1, with data from 99 dyads available at time 2. The study population consisted of 83 dyads where both members reported the patient receiving chemotherapy and had questionnaire data at both time points. We were interested in investigating these questions at more than one time point because studies evaluating proxy–patient concordance for PROs related to patient health have not had consistent results about concordance over time (Milne et al. Reference Milne, Mulder and Beelen2006). As perceptions of health may change over time, it is plausible that perceptions of care experience may change as well, particularly because health status is an important predictor of satisfaction with care (Hall et al. Reference Hall, Milburn and Epstein1993).
Measures
In both surveys, proxies were told that they were being asked to answer questions about the patient’s experiences with cancer care. No explicit directions were provided regarding perspective taking (e.g., use of their own perspective or answering as they believe the patient would have) (Pickard and Knight Reference Pickard and Knight2005).
Quality and experience outcomes
Prior to commencing questions in this section of the questionnaire, patients were given the following instructions at time 1: “Now I would like you to think about your overall experience with your oncologist, your cancer care providers and oncology office staff by rating some aspects of the care you have received. For the next set of questions, I will be asking you to respond using a scale from 0 to 10, where 0 means the worst care possible and 10 means the best care possible.” Proxy instructions were the same, but with “your” replaced by “his/her,” referring to the patient. The instructions were the same at time 2, except for both respondents, the first sentence ended with “within the past month.”
Quality of care was assessed with a 5-point scale ranging from poor to excellent. Respondents were asked, “Overall, how would you rate the care that you [PATIENT] have received from your (his/her) oncologist.” The same question was administered at time 1 and time 2. Communication was assessed by the question, “Using a scale of 0–10, how well do your [PATIENT] cancer care providers communicate with you [PATIENT] and your (his/her) family/friends about your (his/her) cancer and the likely outcomes of care?”; for patients and proxies, 0 was the worst score and 10 was the best score. The same question was administered at time 1 and time 2. Experience with provider help for symptom control was examined with the question, “Using a scale of 0–10, how well do your [PATIENT] cancer care providers make sure that your (his/her) symptoms (e.g., pain and shortness of breath) are controlled to a degree that is acceptable to you (him/her)?” At time 2, the question was the same except “nausea” was added as another example of a symptom for both respondents: “(e.g., pain, shortness of breath, and nausea).”
Assessment of symptoms
Pain, trouble breathing, and feelings of anxiety or distress were assessed at both time points. Diarrhea and nausea/vomiting were assessed at time 2 but not time 1. If respondents indicated a symptom was present in the past month (y/n), they answered follow-up questions including how much the symptom bothered or distressed the patient (not at all to very much) and the amount of help received for the symptom (less than needed/about the right amount/more than needed).
For this analysis, we operationalized symptom burden and management in 2 ways. First, using only symptoms that were measured at both time points (pain, trouble breathing, and anxiety/distress), we created a 2-level variable to determine the presence and management of symptoms (at least 1 symptom with an undesired level of help/no symptoms or a desired level of help if symptoms present). This derived variable measures unmet needs for symptom management. Second, because pain is one of the symptoms that met inclusion criteria for PRO-PMs (Stover et al. Reference Stover, Urick and Deal2019b) and the literature has examples of generally accurate (Milne et al. Reference Milne, Mulder and Beelen2006) and less accurate (Montgomery et al. Reference Montgomery, Vos and Raybin2021) proxy reporting of symptom distress, we created a binary variable to indicate whether patients had any level of distress or bother from pain (no pain or pain without distress or bother/pain and some level of distress or bother). A top score for this variable indicated no pain or pain without distress/bother. This derived variable measures symptom burden.
Variables describing study sample
Except for cancer and practice type (comprehensive cancer center or private practice), all other variables were patient- or proxy-reported. Detailed information about the proxy–patient relationship was also elicited from the proxy with questions about the frequency of assisting the patient, discussion of medical condition and attendance at oncologist visits (never to always), and extent of involvement in cancer treatment decisions (not at all to very much). Proxies also rated their confidence in understanding the impact of cancer on the patient’s health and of how treatment decisions were made on 0–10 scales (not to very). Proxies and patients also rated their own overall health status (poor to excellent), level of distress (0–10), and level of psychological distress and well-being using the validated Mental Health Inventory-5 (MHI-5) (Berwick et al. Reference Berwick, Murphy and Goldman1991; Rumpf et al. Reference Rumpf, Meyer and Hapke2001). For consistency, all health-related questions and scales were transformed so that higher scores indicated better responses.
Data analyses
Frequencies, means, and standard deviations were used to describe the data. Because of the small sample size, we focused on descriptive rather than inferential statistics for analysis. The description of perception of quality and care experience and symptom management and burden (Aim 1) was undertaken using descriptive statistics. Kendall’s tau was used to evaluate the association between proxy and patient responses on the rating scales for provider communication and efforts with symptom control. Concordance (Aim 2) was assessed using several metrics. Although kappa is widely used to evaluate inter-rater concordance (Shankar and Bangdiwala Reference Shankar and Bangdiwala2014), its performance can be affected by prevalence (Byrt et al. Reference Byrt, Bishop and Carlin1993; Feinstein and Cicchetti Reference Feinstein and Cicchetti1990; Flight and Julious Reference Flight and Julious2015), which is of concern given that experience data are skewed. Our primary measure was therefore Gwet’s AC1 (Gwet Reference Gwet2008), but for comprehensiveness, we assessed Cohen’s kappa (Cohen Reference Cohen1960), percent agreement, p pos and p neg (Cicchetti and Feinstein Reference Cicchetti and Feinstein1990). Kappa, Gwet’s AC1, and percent agreement were calculated using the irrCAC package (Gwet Reference Gwet2019). p pos was calculated as the proportion of agreement for top (best) scores and p neg for <top scores, using Cicchetti and Feinstein’s formulas (Cicchetti and Feinstein Reference Cicchetti and Feinstein1990).
For the subgroup analyses (Aim 3), we estimated the percentage of top scores by respondent type for each level of symptom management and burden at each time point. We repeated this analysis for subgroups defined by practice type. For all aims, each outcome was evaluated separately. R Studio (v.4.0.1) was used for all analyses.
Missing data
Missing values were infrequent, with only 1–3 outcome scores missing at each time point. Available case analysis was used.
Results
Study population
Most patients received care from a private practice (67%) rather than a comprehensive care site (33%). Patients were older than their caregivers (mean age 65.6 [SD = 10.7] compared to 59.4 [SD = 12.8]), whereas a higher proportion of proxies had received at least a university-level education compared to patients (28% vs. 20%). Most proxies were the patient’s partner or spouse (69%), and nearly all reported being in contact with the patient every day (95%). Proxies reported being highly engaged in care, with the majority “always” discussing the patient’s medical condition with them (54%) and attending oncologist visits (63%), as well as being “very much” involved in patient treatment decisions (76%) (Table 1).
Symptom management (receipt of desired level of help for symptoms)
At both times 1 and 2, a higher percentage of proxies than patients reported inappropriate help for patient symptoms (time 1: 14% vs. 10% and time 2: 12% vs. 6%) (Table 1). Concordance as measured by Gwet’s AC (Gwet Reference Gwet2008) for this outcome was high (0.82 at time 1 and 0.86 at time 2); similarly, percent agreement and p pos were >0.80 at both time points. p neg was <0.5 at both time points. Cohen’s kappa (Cohen Reference Cohen1960) was low at both time points (0.32 at time 1 and 0.22 at time 2) (Table 2).
Presence of pain distress
Higher percentages of proxies than patients reported that patients experienced some distress from pain, at both time 1 (59% vs. 52%) and time 2 (59% vs. 57%). However, at both time points, the majority of each respondent type reported distress from pain (Table 1). Concordance as measured by Gwet’s AC was higher at time 2 (0.86) compared to time 1 (0.67), though >0.5 at both time points. p pos was >0.80 at both time points (time 1: 0.81 and time 2: 0.91), as was percent agreement. p neg (time 1: 0.85 and time 2: 0.94) and Cohen’s kappa were likewise high at both time points (time 1: 0.66 and time 2: 0.85) (Table 2).
Care quality
Nearly all proxies and patients gave the top scores for quality at both time points (t1: 77% vs. 80% and t2: 76% vs. 78%) (Table 1). Gwet’s AC showed high concordance (time 1: 0.67 and time 2: 0.70), as did p pos (time 1: 0.86 and time 2: 0.87); percent agreement was >0.7 at both time points. However, p neg (time 1: 0.47 and time 2: 0.56) and Kappa were lower (time 1: 0.33 and time 2: 0.43) (Table 2).
Provider communication
At both time points, proxy and patient scores were consistently high, with mean scores of ∼9 regardless of respondent type at both time points (Table 1). Interestingly, although most respondents gave very high scores as indicated by the mean scores, at both time points, the score range reported by patients was wider (time 1: 0–10 vs. 5–10 and time 2: 0–10 vs. 4–10) (data not shown). Furthermore, lower percentages of proxies gave the top scores for communication at time 1 than those of patients (58% vs. 69%), but the reverse was true at time 2 (53% vs. 49%) (Table 1). Correlation for the ordinal provider communication outcomes between respondents was similar at both time points: t1: τ = 0.366 and t2: τ = 0.345 (data not shown). Concordance as measured by Gwet’s AC was <0.5 for this outcome at both time points (time 1: 0.44 and time 2: 0.26), although p pos was >0.5 (time 1: 0.76 and time 2: 0.64), as was p neg (time 1: 0.59 and time 2: 0.62). Kappa was likewise low (time 1: 0.36 and time 2: 0.26) (Table 2), while percent agreement was >0.6 at both time points.
Provider help with symptom control
Both patients and proxies gave very high scores for provider help with symptom control at both time points, with mean scores of ∼9 reported at every time point regardless of respondent type (Table 1). At both times 1 and 2, over 50% of respondents regardless of type gave the highest score for provider efforts with symptom control (t1: 64% vs. 67%; t2: 59% vs. 59%) (Table 1). Correlation for providers’ efforts with symptom control outcome increased over time (t1: τ = 0.215, t2: τ = 0.486) (data not shown). Concordance as measured by Gwet’s AC likewise showed an increase over time (time 1: 0.38 and time 2: 0.54), as did Cohen’s kappa (time 1: 0.18 and time 2: 0.51). p pos was high at both time points (time 1: 0.74 and time 2: 0.8), and p neg increased over time (time 1: 0.44 and time 2: 0.71), as did percent agreement (time 1: 0.65 and time 2: 0.76).
Reports of quality and experience within subgroups
Appropriateness of help received for symptoms
Most patients who reported receiving the wrong level of help for a symptom nonetheless reported the highest quality score (t1: 75% and t2: 80%), whereas this was not the case for proxies (t1: 45% and t2: 40%). However, this pattern was not seen for the communication scores. Among the subgroup of patients or proxies reporting inappropriate help for symptoms, a higher percentage of proxies than patients reported the highest communication score at time 1 (33% vs. 13%) but not at time 2 (30% vs. 40%). For the provider help with symptom control outcome, a higher percentage of patients than proxies endorsed a top score even when inappropriate help for symptoms was received, both at time 1 (38% vs. 25%) and time 2 (60% vs. 30%) (Table 3).
Pain distress
Regardless of the presence or absence of pain distress, >50% proxies and patients reported the highest scores for all quality and experience outcomes at time 1. At time 2, >50% of both respondent types continued to report the highest possible quality and experience scores across distress levels, except for the communication score, where <50% of proxies (48%) and patients (49%) who reported no distress gave the highest score (Table 3).
Practice type
The percentage of top scores reported for the quality outcome was similar for patients and proxies, regardless of practice type, with higher percentages reported for private practice at both time points (Figure 1A). However, there were differences between respondent types for the symptom and communication outcomes. For both outcomes, the percentage of patients giving the highest score for comprehensive care sites was higher at time 1 than time 2, whereas the reverse was true for proxies. These between-respondent differences were not seen for private practice.
Mean scores for the communication and symptom outcomes were high across respondent and practice types. At time 1, mean communication scores for proxies and patients were only slightly higher for private practice (proxies: mean = 9.4, SD = 1.1 and patients: mean = 9.2, SD = 2.0) compared to comprehensive care sites (proxies: mean = 9.0, SD = 1.1 and patients: mean = 9.0, SD = 1.4). Similarly, respondent scores were close at time 2 (data not shown). The mean scores for symptom outcomes at time 1 were high for both proxies and patients for private practice (mean = 9.5, SD = 1.0 and mean = 9.3, SD = 1.4, respectively) and for comprehensive care sites (mean = 9.0, SD = 1.6 for both). Similar results were seen for time 2 (data not shown).
Comparing private practices and comprehensive care sites for symptom management and burden yielded similar results, regardless of respondent type or time point (Figure 1B). Specifically, higher percentages of reports of pain distress were seen for comprehensive care sites, and proxy and patient reports of these were similar. High percentages of both proxies and patients reported “correct level of help for symptoms” for comprehensive care sites and private practices. However, at time 1, there was a higher percentage of patients reporting an inappropriate level of help in private practices compared to comprehensive care sites, but the reverse was true at time 2. In contrast, for proxies, the percentage was consistently higher for comprehensive care sites. Furthermore, a higher percentage of proxies reported receipt of inappropriate help for symptoms compared to patients regardless of site, although in general, this was not frequent (<20% for all time points and respondent types).
Discussion
Most participants, regardless of respondent type, gave the highest possible quality and experience scores at both time points. At the same time, at least 10% (depending on respondent type) of participants indicated an unmet need for symptom management. Concordance was generally moderate to good. In subgroup analyses, there were some differences between proxies and patients in terms of reporting the highest score, but these were not consistent. Such differences have the potential to be influential in studies where the proportions of proxies differ between arms or sites.
Of note is that concordance for perceived communication was <0.5 despite proxies and patients giving generally high scores at both time points. Patient–proxy concordance on communication items was lower compared to concordance on more objective items in another study (Giovannetti et al. Reference Giovannetti, Reider and Wolff2013); however, as quality is not necessarily more objective than communication, it is unclear why communication would have lower concordance in our study. Because the communication question investigates the “likely outcomes of cancer care,” differences in expectations or perception of likely outcomes of cancer care between patients and proxies may be a possible explanation.
Previous research involving patient–proxy dyads to compare symptom assessment has generally found that proxies tend to overestimate patient symptoms (Ma et al. Reference Ma, Yu and Lu2021; McPherson et al. Reference McPherson, Wilson and Lobchuk2008; Silveira et al. Reference Silveira, Given and Given2010; Winters-Stone et al. Reference Winters-Stone, Lyons and Bennett2014), although there are also studies with findings of high levels of patient–proxy agreement (Akin and Durna Reference Akin and Durna2013; Armstrong et al. Reference Armstrong, Wefel and Gning2012) or of no differences (Miller et al. Reference Miller, Lyons and Bennett2015; Yeager et al. Reference Yeager, Lee and Bai2022). In this study, we found that higher percentages of proxies reported pain distress and the receipt of inappropriate help for symptoms, but the percentages of proxies and patients reporting pain distress were closer at time 2 than time 1, whereas the reverse was true for inappropriate help for symptoms. Additional longitudinal studies are needed to understand how proxy reporting may change over time.
Growing interest in patient-reported symptom measures as part of performance measurement in oncology (Stover et al. Reference Stover, Urick and Deal2020) highlights the potential importance of proxy ratings and potential differences between proxies and patients. The exclusion of patients who cannot report their own symptoms may result in a biased population, but lack of adjustment for proxy–patient differences may disadvantage services with higher proportions of proxies in between-service comparisons. In some surveys, the frequency of proxy use can vary across racial/ethnic groups (Pinheiro et al. Reference Pinheiro, Wheeler and Chen2015); if the sociodemographic characteristics of the population served vary by services or sites, this could affect the frequency of proxy use and the measured experience for those services or sites.
Additionally, patient experience measures can be used to inform improvements at a service level (Manalili and Santana Reference Manalili and Santana2021). Our findings, like those of Havyer et al. (Reference Havyer, van Ryn and Wilson2019), suggest agreement is not as high for scores other than the highest scores, which can pose challenges for the use of the data to inform quality improvement efforts (Havyer et al. Reference Havyer, van Ryn and Wilson2019) in the absence of guidelines for approaches that adjust for proxy reports. The development of such guidelines is an important area for future research. In addition to analytic methods, other considerations for data collection and future work include factors that may affect proxy reporting. Previous research has indicated that the proxy–patient relationship and the proxy’s engagement in care are associated with proxy reports of care experience and quality (Roydhouse et al. Reference Roydhouse, Gutman and Keating2018a).
This study has several limitations. The sample size was relatively small and drawn from practices in a few north-eastern US states, which may limit generalizability. Additionally, the data were collected several years ago, and cancer treatment has changed in recent years. However, chemotherapy remains a mainstay of treatment, and the methodological focus of the study makes the age of the data less concerning. Additional limitations pertaining to the age of the data include improvements in symptom management and greater focus on patient-centered care and communication.
Additionally, in this study, proxies were instructed to report about the patient’s experiences with care, but it is not clear what perspective they may have taken when doing so. Furthermore, this study has several strengths, including the assessment at more than one time point. The relative paucity of longitudinal data on proxies has been recognized before (Roydhouse and Wilson Reference Roydhouse and Wilson2017; Sneeuw et al. Reference Sneeuw, Sprangers and Aaronson2002), and because there is an increasing interest in PRO assessment throughout the cancer journey, studies that are not limited to single time points are important.
In conclusion, our findings suggest that the use of proxies to report on care quality and experience outcomes may not change estimates substantially. Proxy endorsement of top scores was only substantially lower for the subgroup of patients with perceived inappropriate symptom help. Should this group vary in size across sites, it is possible that estimates may be affected. Furthermore, although data from both proxies and patients were available in this study, this is unlikely to be the case in practice, as proxies will be required when patients are too ill to self-report. Because patients requiring proxies tend to be older and in poorer health (Roydhouse et al. Reference Roydhouse, Gutman and Keating2018b), development of methods to analyze datasets with information from both patients and proxies is important, particularly if these assessments inform evaluations of provider performance or are used for quality improvement.
Conflicts of interest
Jessica Roydhouse reports personal fees from Amgen, outside the submitted work, and consultancies with University of Birmingham Enterprise, outside the submitted work. Roee Gutman served as an expert witness to Janssen/J&J on an unrelated matter.