This paper describes a natural language text extraction system, called MEDLEE, that has been applied to the medical domain. The system extracts, structures, and encodes clinical information from textual patient reports. It was integrated with the Clinical Information System (CIS), which was developed at Columbia-Presbyterian Medical Center (CPMC) to help improve patient care. MEDLEE is currently used on a daily basis to routinely process radiological reports of patients at CPMC.
In order to describe how the natural language system was made compatible with the existing CIS, this paper will also discuss engineering issues which involve performance, robustness, and accessibility of the data from the end users' viewpoint.
Also described are the three evaluations that have been performed on the system. The first evaluation was useful primarily for further refinement of the system. The two other evaluations involved an actual clinical application which consisted of retrieving reports that were associated with specified diseases. Automated queries were written by a medical expert based on the structured output forms generated as a result of text processing. The retrievals obtained by the automated system were compared to the retrievals obtained by independent medical experts who read the reports manually to determine whether they were associated with the specified diseases. MEDLEE was shown to perform comparably to the experts. The technique used to perform the last two evaluations was found to be a realistic evaluation technique for a natural language processor.