Hostname: page-component-f554764f5-rj9fg Total loading time: 0 Render date: 2025-04-20T18:21:38.876Z Has data issue: false hasContentIssue false

360 Using machine learning to analyze voice and detect aspiration

Published online by Cambridge University Press:  11 April 2025

Cyril Varghese
Affiliation:
Mayo Clinic
Jianwei Zhang
Affiliation:
Mayo Clinic
Sara A. Charney
Affiliation:
Mayo Clinic
Abdelmohaymin Abdalla
Affiliation:
Mayo Clinic
Stacy Holyfield
Affiliation:
Mayo Clinic
Adam Brown
Affiliation:
Mayo Clinic
Hunter Stearns
Affiliation:
Mayo Clinic
Michelle Higgins
Affiliation:
Mayo Clinic
Julie Liss
Affiliation:
Mayo Clinic
Nan Zhang
Affiliation:
Mayo Clinic
David G. Lott
Affiliation:
Mayo Clinic
Victor E. Ortega
Affiliation:
Mayo Clinic
Visar Berisha
Affiliation:
Mayo Clinic Arizona and ^Arizona State University
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Objectives/Goals: Aspiration causes or aggravates lung diseases. While bedside swallow evaluations are not sensitive/specific, gold standard tests for aspiration are invasive, uncomfortable, expose patients to radiation, and are resource intensive. We propose the development and validation of an AI model that analyzes voice to noninvasively predict aspiration. Methods/Study Population: Retrospectively recorded [i] phonations from 163 unique ENT patients were analyzed for acoustic features including jitter, shimmer, harmonic to noise ratio (HNR), etc. Patients were classified into three groups: aspirators (Penetration-Aspiration Scale, PAS 6–8), probable (PAS 3–5), and non-aspirators (PAS 1–2) based on video fluoroscopic swallow (VFSS) findings. Multivariate analysis evaluated patient demographics, history of head and neck surgery, radiation, neurological illness, obstructive sleep apnea, esophageal disease, body mass index, and vocal cord dysfunction. Supervised machine learning using five folds cross-validated neural additive network modelling (NAM) was performed on the phonations of aspirator versus non-aspirators. The model was then validated using an independent, external database. Results/Anticipated Results: Aspirators were found to have quantifiably worse quality of sound with higher jitter and shimmer but lower harmonics noise ratio. NAM modeling classified aspirators and non-aspirators as distinct groups (aspirator NAM risk score 0.528+0.2478 (mean + std) vs. non-aspirator (control) risk score of 0.252+0.241 (mean + std); p Discussion/Significance of Impact: We report the use of voice as a novel, noninvasive biomarker to detect aspiration risk using machine learning techniques. This tool has the potential to be used for the safe and early detection of aspiration in a variety of clinical settings including intensive care units, wards, outpatient clinics, and remote monitoring.

Type
Informatics, AI and Data Science
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright
© The Author(s), 2025. The Association for Clinical and Translational Science