Hostname: page-component-f554764f5-246sw Total loading time: 0 Render date: 2025-04-19T17:52:05.178Z Has data issue: false hasContentIssue false

218 Retrospective comparative analysis of prostate cancer in-basket messages: Responses from closed-domain LLM vs. clinical teams

Published online by Cambridge University Press:  11 April 2025

Yuexing Hao
Affiliation:
Mayo Clinic
Jason M. Holmes
Affiliation:
Mayo Clinic
Jared Hobson
Affiliation:
Mayo Clinic
Alexandra Bennett
Affiliation:
Mayo Clinic
Daniel K. Ebner
Affiliation:
Mayo Clinic
Mark R. Waddle
Affiliation:
Mayo Clinic
Wei Liu
Affiliation:
Mayo Clinic
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Objectives/Goals: Our study’s objective is to evaluate RadOnc-GPT, a GPT-4o powered LLM, in generating responses to in-basket messages related to prostate cancer treatment in the Radiation Oncology department. By integrating it with electronic health record (EHR) systems, the goal is to assess its impact on clinician workload, response quality, and efficiency in healthcare communication. Methods/Study Population: RadOnc-GPT was integrated with patient EHRs from both hospital-wide and radiation-oncology-specific databases. The study examined 158 pre-recorded in-basket message interactions from 90 non-metastatic prostate cancer patients. Quantitative natural language processing analysis and two randomized single-blinded grading studies, involving four clinicians and four nurses, were conducted to evaluate RadOnc-GPT’s response quality in completeness, correctness, clarity, empathy, and estimated editing time. Response times were measured to estimate the time saved for clinicians and nurses. The study population included patient messages across all phases of care (pre-, during, and post-treatment) for those undergoing radiotherapy. Results/Anticipated Results: In the single-blinded grader study, clinician graders evaluated 316 responses (158 from human care teams and 158 from RadOnc-GPT). Results showed RadOnc-GPT outperformed human responses in empathy and clarity, while humans excelled in completeness and correctness. Sentiment analyses using TextBlob and VADER revealed RadOnc-GPT responses had a positive mean score of 0.25, whereas human responses clustered around neutral. VADER analysis indicated a high median score for RadOnc-GPT, nearing 1.0, reflecting predominantly positive sentiment, while human responses displayed a broader sentiment range, indicating sensitivity to context. Clinicians averaged 3.60 minutes (SD 1.44) to respond, compared to 6.39 minutes (SD 4.05) for nurses, highlighting RadOnc-GPT’s efficiency in generating timely responses. Discussion/Significance of Impact: RadOnc-GPT effectively generated responses to individualized patient in-basket messages, comparable to those from radiation oncologists and nurses. While human oversight is still necessary to avoid errors, RadOnc-GPT can speed up response times and reduce pressure on care teams, shifting their role from drafting to reviewing responses.

Type
Evaluation
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright
© The Author(s), 2025. The Association for Clinical and Translational Science