Fast and Fine: NLP Methods for Near Real-Time and Fine-Grained Overdose Surveillance

Grants and Contracts Details


Overdose (OD) deaths have reached a record 100,000 during the 12 month period from April 2020 to April 2021. The loss of life, the financial toll, and the staggering amount of disability (arising from nonfatal OD events) have been plaguing the nation for the past two decades, only to be compounded by the ongoing COVID-19 pandemic. To aid in rapid allocation of resources and mitigate the OD epidemic, methods and tools to improve OD surveillance, both in terms of timeliness and granularity, are of urgent need. In this application, we propose to use advances in natural language processing (NLP) and machine learning (ML) methods to build and validate models that directly work on emergency medical service (EMS) reports and triage/discharge notes from emergency departments (EDs). The goal is to first classify OD cases from these narratives and then to also identify the specific substances used, as reported in them. This is expected to drastically increase timeliness, especially for fatal ODs, which take a substantial amount time (weeks to months) to be finalized through the standard practice involving medical examiners, coroners, and centralized coding at the CDC. Since EMS and ED are typically the first points of interaction of OD patients with the healthcare system, generating fatal OD estimates from them will result in faster surveillance. For nonfatal cases, EMS and ED visits are indispensable sources of surveillance. Our focus on textual narratives arises from observations that structured sources do not adequately capture OD events. Our central hypothesis is that surveillance estimates generated through our NLP models will be superior to those generated by standard rule-based case definitions, especially when using a balanced metric (e.g., F1- score) that considers both sensitivity and precision.
Effective start/end date9/30/229/29/25


  • National Institute on Drug Abuse: $1,344,685.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.