No CrossRef data available.
Article contents
358 Rare disease study identification (RDSI): A natural language processing assisted search and visualization tool for clinical studies of rare diseases
Published online by Cambridge University Press: 11 April 2025
Abstract
Objectives/Goals: Identifying and indexing rare disease studies is labor intensive, especially in research centers with a large number of trials. To address this gap, we applied natural language processing (NLP) and visualization techniques to develop an efficient pipeline and user-friendly web interface. Our goal is to offer the rare disease study identification (RDSI) tool for adoption by other sites. Methods/Study Population: The RDSI retrieves study information (short and long titles, study abstract) from the IRB system. These descriptive fields are then processed by the MetaMap Lite NLP program for identifying disease terms and standardizing them to UMLS concepts. By terminology identifier mapping, the diseases intersecting with concepts in rare disease databases (Genetic and Rare Disease program and Orphanet) are further scored to pinpoint studies that focus on a rare disease. The web interface displays a scatter bubble chart as an overview of all the rare diseases, with each bubble size proportional to the number of studies for that disease. In addition to the visual navigation, users can search studies by disease name, PI, or IRB number. Search results contain detailed study information as well as the evidence used by algorithms of the pipeline. Results/Anticipated Results: The RDSI identification results and functions were verified manually and spot-checked by several study investigators. The web interface is a self-contained solution available to our staff for various use cases like reporting or environment scan. We have built in a versioning mechanism that logs the date of each major result in the process. Therefore, even as the rare disease data sources evolve over time, we will be able to preserve any historical context or perform updates as needed. The RDSI outputs are replicated to Mayo Clinic’s enterprise data warehouse daily, allowing tech-savvy users to leverage any useful intermediate results at the backend. We anticipate the performance of the rare disease identification to be further enhanced by employing the advancements in AI technology. Discussion/Significance of Impact: The RDSI represents an informatics solution that offers efficiency in identifying and navigating rare disease clinical studies. It features the use of public databases and open-source tools, manifesting return on investment from the broad translational science ecosystem. These considerations are informative and adoptable by other institutions.
- Type
- Informatics, AI and Data Science
- Information
- Creative Commons
- This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
- Copyright
- © The Author(s), 2025. The Association for Clinical and Translational Science