Predicting the Utility of Scientific Articles for Emerging Pandemics Using Their Titles and Natural Language Processing

Kinga Dobolyi; Sidra Hussain; Grady McPeak

doi:10.1017/dmp.2024.109

Predicting the Utility of Scientific Articles for Emerging Pandemics Using Their Titles and Natural Language Processing

Published online by Cambridge University Press: 10 May 2024

Kinga Dobolyi ,

Sidra Hussain and

Grady McPeak

Show author details

Kinga Dobolyi*: Affiliation:
Department of Computer Science, George Washington University, Washington, DC, USA
Sidra Hussain: Affiliation:
Department of Computer Science, George Washington University, Washington, DC, USA
Grady McPeak: Affiliation:
Department of Computer Science, George Washington University, Washington, DC, USA
*: Corresponding author: Kinga Dobolyi, PhD; Email: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Objective:

Not all scientific publications are equally useful to policy-makers tasked with mitigating the spread and impact of diseases, especially at the start of novel epidemics and pandemics. The urgent need for actionable, evidence-based information is paramount, but the nature of preprint and peer-reviewed articles published during these times is often at odds with such goals. For example, a lack of novel results and a focus on opinions rather than evidence were common in coronavirus disease (COVID-19) publications at the start of the pandemic in 2019. In this work, we seek to automatically judge the utility of these scientific articles, from a public health policy making persepctive, using only their titles.

Methods:

Deep learning natural language processing (NLP) models were trained on scientific COVID-19 publication titles from the CORD-19 dataset and evaluated against expert-curated COVID-19 evidence to measure their real-world feasibility at screening these scientific publications in an automated manner.

Results:

This work demonstrates that it is possible to judge the utility of COVID-19 scientific articles, from a public health policy-making perspective, based on their title alone, using deep natural language processing (NLP) models.

Conclusions:

NLP models can be successfully trained on scienticic articles and used by public health experts to triage and filter the hundreds of new daily publications on novel diseases such as COVID-19 at the start of pandemics.

Keywords

natural language processing pandemic policy public health scientific articles utility

Type: Original Research
Information: Disaster Medicine and Public Health Preparedness , Volume 18 , 2024 , e103

DOI: https://doi.org/10.1017/dmp.2024.109 [Opens in a new window]
Copyright: © The Author(s), 2024. Published by Cambridge University Press on behalf of Society for Disaster Medicine and Public Health, Inc

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Odone, A, Galea, S, Stuckler, D, Signorelli, C. The first 10000 COVID-19 papers in perspective: are we publishing what we should be publishing? Eur J Public Health. 2020;30(5):849-850. https://doi.org/10.1093/eurpub/ckaa170 Google Scholar

Raynaud, M, Zhang, H, Louis, K, et al. COVID-19-related medical research: a meta-research & critical appraisal. BMC Med Res Methodol. 2021;21(1):2313-2349. https://doi.org/10.1186/s12874-020-01190-w Google Scholar

Jalali, R, Hosseinian-Far, A, Mohammadi, M. Contradictions in the promotion of publishing academic & scientific journal articles, & the inability to cope with the new coronavirus (COVID-19). Antimicrob Resist Infect Control. 2021;10(1). Published online 12 January 2021. https://doi.org/10.1186/s13756-021-00884-0 Google Scholar

Mohammed, M, Sha’aban, A, Jatau, AI, et al. Assessment of COVID-19 information overload among the general public. J Racial Ethn Health Disparities. 2022;9(1):184-192. https://doi.org/10.1007/s40615-020-00942-0 Google Scholar

Bai, X, Liu, H, Zhang, F, et al. An overview on evaluating and predicting scholarly article impact. Information. 2017;8(17). Published online 25 June 2017. https://doi.org/10.3390/info8030073 Google Scholar

Rossi, MJ, Brand, JC. Journal article titles impact their citation rates. Arthroscopy. 2020;36(7):2025-2029. https://doi.org/10.1016/j.arthro.2020.02.018 Google Scholar

Beranová, L, Joachimiak, MP, Kliegr, T, et al. Why was this cited? Explainable machine learning applied to COVID-19 research literature. Scientometrics. 2022;127:2313-2349. https://doi.org/10.1007/s11192-022-04314-9 Google Scholar

COVID-19. COVID19 subreddit. Published 2020. Accessed February 1, 2020–July 31, 2020. https://www.reddit.com/r/COVID19/ Google Scholar

Master Question List for COVID-19. US Department of Homeland Security. Published 2020. Accessed December 21, 2020. https://www.dhs.gov/publication/st-master-question-list-COVID-19 Google Scholar

Wang, LL, Lo, K, Chandrasekhar, Y, et al. CORD-19: The COVID-19 open research dataset. Preprint. ArXiv. Published online April 22, 2020.Google Scholar

Devlin, J, Chang, M, Lee, K, Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019;1(Long & Short Papers):4171-4186. https://doi.org/10.18653/v1/N19-1423 Google Scholar

Download MeSH Data. National Library of Medicine. Published 2022. Accessed December 1, 2022. https://www.nlm.nih.gov/databases/download/mesh.html Google Scholar

Fabiano, N, Hallgrimson, Z, Wong, S, et al. Selective tweeting of COVID-19 articles: does title or abstract positivity influence dissemination? Preprint. medRxiv. 2021. Published online 24 June 2021. https://doi.org/10.1101/2021.06.22.21259354 Google Scholar

Lockwood, G. Academic clickbait: articles with positively-framed titles, interesting phrasing, and no wordplay get more attention online. The Winnower. 2016;3. Published online 29 June 2016.Google Scholar

Hallock, RM, Bennett, TN. I’ll read that!: what title elements attract readers to an article? Teach Psychol. 2021;48(1):26-31. https://doi.org/10.1177/0098628320959948 Google Scholar

Älgå, A, Eriksson, O, Nordberg, M. The development of preprints during the COVID-19 pandemic. J Intern Med. 2021;290(2):480-483. https://doi.org/10.1111/joim.13240 Google Scholar

Article contents

Predicting the Utility of Scientific Articles for Emerging Pandemics Using Their Titles and Natural Language Processing

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests