OP135 Machine Learning And Cancer Registry: Evaluation Of The Effectiveness Of Case Coding

Carmelo Ettore Viscosi; Alessia Anna Di Prima; Antonina Torrisi; Antonietta Alfia Torrisi; Margherita Ferrante; Rosalia Ragusa

doi:10.1017/S0266462323001381

Introduction

Machine learning (ML) algorithms are computational procedures that use pattern recognition and inference by learning from previously categorized documents to predict the category to which a new document belongs. The role of machine learning within cancer registries remains unclear given the lack of in-depth testing and guidance from health technology assessment (HTA) agencies. We evaluated the effectiveness of coding new cases through machine learning at the Integrated Cancer Registry.

Methods

The Integrated Cancer Registry covers the eastern area of Sicily in Italy, which has an annual average incidence of about 10,000 cases of malignant neoplasm. Potential new cancer cases were retrieved from pathology services and processed by pathologists who confirmed the neoplastic nature of supposed cases and specified the morphological type and location of the tumors. The current method involves identification by reading the free-text report when International Classification Diseases for Oncology information was not provided. We used the new Microsoft ML.Net Library, a framework developed in response to the challenge of facilitating machine learning pipeline utilization in large software applications. A total of 1,050,952 free-text pathology reports published from 2003 to 2018 were selected separately from all Sicilian pathology services and uploaded to machine learning software that explored eight binary classification algorithms.

Results

We evaluated each algorithm’s performance by calculating metrics (the number of true positives, true negatives, false positives, and false negatives) from the classification procedure applied to the test dataset. The metrics used were accuracy, F1 score, and area under the receiver operating characteristic curve. With a test set of around 210,000 text diagnoses, each algorithm reached an F1 score of up to 95 percent.

Conclusions

Machine learning algorithms capture relevant information about tumors from free-text pathology reports, optimizing the process and reducing waste. With the help of machine learning systems, cancer registries can provide more timely data for research and evaluation of all types of new cancer technologies (drugs, devices, radiology and radiotherapy equipment, diagnostic devices, robotic surgery, and vaccines).

Article contents

OP135 Machine Learning And Cancer Registry: Evaluation Of The Effectiveness Of Case Coding

Abstract

Article contents

OP135 Machine Learning And Cancer Registry: Evaluation Of The Effectiveness Of Case Coding

Abstract

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests