Hostname: page-component-cd9895bd7-8ctnn Total loading time: 0 Render date: 2024-12-28T02:07:48.130Z Has data issue: false hasContentIssue false

Comparison of rule-based and neural network models for negation detection in radiology reports

Published online by Cambridge University Press:  18 November 2020

D. Sykes*
Affiliation:
Division of Psychiatry, Centre for Clinical Brain Sciences
A. Grivas
Affiliation:
Institute for Language, Cognition and Computation, School of Informatics
C. Grover
Affiliation:
Institute for Language, Cognition and Computation, School of Informatics
R. Tobin
Affiliation:
Institute for Language, Cognition and Computation, School of Informatics
C. Sudlow
Affiliation:
Usher Institute of Population Health Sciences and Informatics
W. Whiteley
Affiliation:
Centre for Clinical Brain Sciences, Edinburgh Medical School
A. Mcintosh
Affiliation:
Division of Psychiatry, Centre for Clinical Brain Sciences
H. Whalley
Affiliation:
Division of Psychiatry, Centre for Clinical Brain Sciences
B. Alex
Affiliation:
Institute for Language, Cognition and Computation, School of Informatics Edinburgh Futures Institute, School of Literatures, Languages and Cultures, University of Edinburgh, Edinburgh, UK
*
*Corresponding author. E-mail: [email protected]

Abstract

Using natural language processing, it is possible to extract structured information from raw text in the electronic health record (EHR) at reasonably high accuracy. However, the accurate distinction between negated and non-negated mentions of clinical terms remains a challenge. EHR text includes cases where diseases are stated not to be present or only hypothesised, meaning a disease can be mentioned in a report when it is not being reported as present. This makes tasks such as document classification and summarisation more difficult. We have developed the rule-based EdIE-R-Neg, part of an existing text mining pipeline called EdIE-R (Edinburgh Information Extraction for Radiology reports), developed to process brain imaging reports, (https://www.ltg.ed.ac.uk/software/edie-r/) and two machine learning approaches; one using a bidirectional long short-term memory network and another using a feedforward neural network. These were developed on data from the Edinburgh Stroke Study (ESS) and tested on data from routine reports from NHS Tayside (Tayside). Both datasets consist of written reports from medical scans. These models are compared with two existing rule-based models: pyConText (Harkema et al. 2009. Journal of Biomedical Informatics42(5), 839–851), a python implementation of a generalisation of NegEx, and NegBio (Peng et al. 2017. NegBio: A high-performance tool for negation and uncertainty detection in radiology reports. arXiv e-prints, p. arXiv:1712.05898), which identifies negation scopes through patterns applied to a syntactic representation of the sentence. On both the test set of the dataset from which our models were developed, as well as the largely similar Tayside test set, the neural network models and our custom-built rule-based system outperformed the existing methods. EdIE-R-Neg scored highest on F1 score, particularly on the test set of the Tayside dataset, from which no development data were used in these experiments, showing the power of custom-built rule-based systems for negation detection on datasets of this size. The performance gap of the machine learning models to EdIE-R-Neg on the Tayside test set was reduced through adding development Tayside data into the ESS training set, demonstrating the adaptability of the neural network models.

Type
Article
Copyright
© The Author(s), 2020. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alex, B., Grover, C., Tobin, R., Sudlow, C., Mair, G. and Whiteley, W. (2019). Text mining brain imaging reports. Journal of Biomedical Semantics 10, 23.CrossRefGoogle ScholarPubMed
Alsentzer, E., Murphy, J., Boag, W., Weng, W.-H., Jindi, D., Naumann, T. and McDermott, M. (2019). Publicly available clinical BERT embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, Minneapolis, Minnesota, USA. Association for Computational Linguistics, pp. 7278.CrossRefGoogle Scholar
Ba, J.L., Kiros, J.R. and Hinton, G.E. (2016). Layer normalization. arXiv e-prints, p. arXiv:1607.06450.Google Scholar
Bengio, Y., Simard, P. and Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5(2), 157166.CrossRefGoogle ScholarPubMed
Chapman, W., Bridewell, W., Hanbury, P., Cooper, G.F. and Buchanan, B. (2001). A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of Biomedical Informatics 34, 301310.CrossRefGoogle ScholarPubMed
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), 3746.CrossRefGoogle Scholar
Cornegruta, S., Bakewell, R., Withey, S. and Montana, G. (2016). Modelling radiological language with bidirectional long short-term memory networks. CoRR, abs/1609.08409.CrossRefGoogle Scholar
Cruz, N.P., Taboada, M. and Mitkov, R. (2017). A machine-learning approach to negation and speculation detection for sentiment analysis. Journal of the Association for Information Science and Technology 67(9), 21182136.CrossRefGoogle Scholar
Fancellu, F., Lopez, A. and Webber, B. (2016). Neural networks for negation scope detection. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany. Association for Computational Linguistics, pp. 495504.CrossRefGoogle Scholar
Gorinski, P.J., Wu, H., Grover, C., Tobin, R., Talbot, C., Whalley, H., Sudlow, C., Whiteley, W. and Alex, B. (2019). Named entity recognition for electronic health records: A comparison of rule-based and machine learning approaches. arXiv e-prints, p. arXiv:1903.03985.Google Scholar
Goryachev, S., Sordo, M., Zeng, Q.T. and Ngo, L. (2006). Implementation and Evaluation of Four Different Methods of Negation Detection. Boston, MA: DSG.Google Scholar
Graves, A. and Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks 18(5–6), 602610.CrossRefGoogle ScholarPubMed
Grivas, A., Alex, B., Grover, C., Tobin, R. and Whiteley, W. (2020). Not a cute stroke: Analysis of Rule- and Neural Network-Based Information Extraction Systems for Brain Radiology Reports. In Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis (LOUHI 2020) at EMNLP 2020.Google Scholar
Grover, C. and Tobin, R. (2006). Rule-based chunking and reusability. In Proceedings of LREC 2006, pp. 873878.Google Scholar
Harkema, H., Dowling, J.N., Thornblade, T. and Chapman, W.W. (2009). Context: An algorithm for determining negation, experiencer, and temporal status from clinical reports. Journal of Biomedical Informatics 42(5), 839851. Biomedical Natural Language Processing.CrossRefGoogle ScholarPubMed
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Computation 9(8), 17351780.CrossRefGoogle ScholarPubMed
Horng, S., Sontag, D.A., Halpern, Y., Jernite, Y., Shapiro, N.I. and Nathanson, L.A. (2017). Creating an automated trigger for sepsis clinical decision support at emergency department triage using machine learning. PLOS ONE 12(4), 116.CrossRefGoogle ScholarPubMed
Hripcsak, G. and Rothschild, A.S. (2005). Agreement, the f-measure, and reliability in information retrieval. Journal of the American Medical Informatics Association 12(3), 296298.CrossRefGoogle ScholarPubMed
Huang, Y. and Lowe, H. (2007). A novel hybrid approach to automated negation detection in clinical radiology reports. Journal of the American Medical Informatics Association : JAMIA 14, 304311.CrossRefGoogle ScholarPubMed
Jackson, C., Crossland, L., Dennis, M., Wardlaw, J. and Sudlow, C. (2008). Assessing the impact of the requirement for explicit consent in a hospital-based stroke study. QJM: Monthly Journal of the Association of Physicians 101(4), 281289.CrossRefGoogle Scholar
Kingma, D.P. and Ba, J. (2015). Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May, 2015, Conference Track Proceedings.Google Scholar
Maldonado, R., Goodwin, T. and Harabagiu, S.M. (2017). Active deep learning-based annotation of electroencephalography reports for cohort identification. In CRI, vol. 2017, pp. 229238.Google Scholar
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J. and McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In Association for Computational Linguistics (ACL) System Demonstrations, pp. 5560.CrossRefGoogle Scholar
Mehrabi, S., Krishnan, A., Sohn, S., Roch, A.M., Schmidt, H., Kesterson, J., Beesley, C., Dexter, P., Schmidt, C.M., Liu, H. and Palakal, M. (2015). Deepen: A negation detection system for clinical text incorporating dependency relation into negex. Journal of Biomedical Informatics 54, 213219.CrossRefGoogle ScholarPubMed
Mou, L., Meng, Z., Yan, R., Li, G., Xu, Y., Zhang, L. and Jin, Z. (2016). How transferable are neural networks in NLP applications? arXiv e-prints, p. arXiv:1603.06111.Google Scholar
Mutalik, P., Deshpande, A.M. and Nadkarni, P.M. (2001). Research paper: Use of general-purpose negation detection to augment concept indexing of medical documents: A quantitative study using the UMLS. Journal of the American Medical Informatics Association: JAMIA 8(6), 598609.CrossRefGoogle Scholar
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L. and Lerer, A. (2017). Automatic differentiation in pytorch. In NIPS-W.Google Scholar
Peng, Y., Wang, X., Lu, L., Bagheri, M., Summers, R. and Lu, Z. (2017). NegBio: A high-performance tool for negation and uncertainty detection in radiology reports. arXiv e-prints, p. arXiv:1712.05898.Google Scholar
Peng, Y., Yan, K., Sandfort, V., Summers, R.M. and Lu, Z. (2019). A self-attention based deep learning method for lesion attribute detection from ct reports. arXiv preprint arXiv:1904.13018.Google Scholar
Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine 6(3), 2145.CrossRefGoogle Scholar
Pons, E., Braun, L.M.M., Hunink, M.G.M. and Kors, J.A. (2016). Natural language processing in radiology: A systematic review. Radiology 279(2), 329343.CrossRefGoogle ScholarPubMed
Pratt, L.Y., Mostow, J. and Kamm, C.A. (1991). Direct transfer of learned information among neural networks. In Proceedings of the Ninth National Conference on Artificial Intelligence - Volume 2, AAAI91. AAAI Press, pp. 584589.Google Scholar
Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S. and Tsujii, J. (2012). brat: A web-based tool for NLP-assisted text annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France. Association for Computational Linguistics, pp. 102107.Google Scholar
Taylor, S. and Harabagiu, S. (2018). The role of a deep-learning method for negation detection in patient cohort identification from electroencephalography reports. Proceedings of the AMIA Annual Symposium 2018, 10181027.Google Scholar
Tjong Kim Sang, E.F. (2002). Introduction to the conll-2002 shared task: Language-independent named entity recognition. In Proceedings of the 6th Conference on Natural Language Learning - Volume 20, COLING-02, Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 14.CrossRefGoogle Scholar
Uzuner, Ö., South, B.R., Shen, S. and DuVall, S.L. (2011). 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association 18(5), 552556.CrossRefGoogle ScholarPubMed
Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M. and Summers, R.M. (2017). Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. CoRR, .CrossRefGoogle Scholar
Wu, S., Miller, T., Masanz, J., Coarr, M., Halgrim, S., Carrell, D. and Clark, C. (2014). Negation’s not solved: Generalizability versus optimizability in clinical natural language processing. PLOS ONE 9(11), 111.CrossRefGoogle Scholar