No CrossRef data available.
Published online by Cambridge University Press: 14 December 2023
Machine learning (ML) algorithms are computational procedures that use pattern recognition and inference by learning from previously categorized documents to predict the category to which a new document belongs. The role of machine learning within cancer registries remains unclear given the lack of in-depth testing and guidance from health technology assessment (HTA) agencies. We evaluated the effectiveness of coding new cases through machine learning at the Integrated Cancer Registry.
The Integrated Cancer Registry covers the eastern area of Sicily in Italy, which has an annual average incidence of about 10,000 cases of malignant neoplasm. Potential new cancer cases were retrieved from pathology services and processed by pathologists who confirmed the neoplastic nature of supposed cases and specified the morphological type and location of the tumors. The current method involves identification by reading the free-text report when International Classification Diseases for Oncology information was not provided. We used the new Microsoft ML.Net Library, a framework developed in response to the challenge of facilitating machine learning pipeline utilization in large software applications. A total of 1,050,952 free-text pathology reports published from 2003 to 2018 were selected separately from all Sicilian pathology services and uploaded to machine learning software that explored eight binary classification algorithms.
We evaluated each algorithm’s performance by calculating metrics (the number of true positives, true negatives, false positives, and false negatives) from the classification procedure applied to the test dataset. The metrics used were accuracy, F1 score, and area under the receiver operating characteristic curve. With a test set of around 210,000 text diagnoses, each algorithm reached an F1 score of up to 95 percent.
Machine learning algorithms capture relevant information about tumors from free-text pathology reports, optimizing the process and reducing waste. With the help of machine learning systems, cancer registries can provide more timely data for research and evaluation of all types of new cancer technologies (drugs, devices, radiology and radiotherapy equipment, diagnostic devices, robotic surgery, and vaccines).