TY - JOUR
T1 - Enhancing timeliness of drug overdose mortality surveillance
T2 - A machine learning approach
AU - Ward, Patrick J.
AU - Rock, Peter J.
AU - Slavova, Svetla
AU - Young, April M.
AU - Bunn, Terry L.
AU - Kavuluru, Ramakanth
N1 - Publisher Copyright:
© 2019 Ward et al.
PY - 2019/10/1
Y1 - 2019/10/1
N2 - Background Timely data is key to effective public health responses to epidemics. Drug overdose deaths are identified in surveillance systems through ICD-10 codes present on death certificates. ICD-10 coding takes time, but free-text information is available on death certificates prior to ICD-10 coding. The objective of this study was to develop a machine learning method to classify free-text death certificates as drug overdoses to provide faster drug overdose mortality surveillance. Methods Using 2017-2018 Kentucky death certificate data, free-text fields were tokenized and features were created from these tokens using natural language processing (NLP). Word, bigram, and trigram features were created as well as features indicating the part-of-speech of each word. These features were then used to train machine learning classifiers on 2017 data. The resulting models were tested on 2018 Kentucky data and compared to a simple rule-based classification approach. Documented code for this method is available for reuse and extensions: https://github.com/pjward5656/dcnlp. Results The top scoring machine learning model achieved 0.96 positive predictive value (PPV) and 0.98 sensitivity for an F-score of 0.97 in identification of fatal drug overdoses on test data. This machine learning model achieved significantly higher performance for sensitivity (p<0.001) than the rule-based approach. Additional feature engineering may improve the model's prediction. This model can be deployed on death certificates as soon as the freetext is available, eliminating the time needed to code the death certificates. Conclusion Machine learning using natural language processing is a relatively new approach in the context of surveillance of health conditions. This method presents an accessible application of machine learning that improves the timeliness of drug overdose mortality surveillance. As such, it can be employed to inform public health responses to the drug overdose epidemic in near-real time as opposed to several weeks following events.
AB - Background Timely data is key to effective public health responses to epidemics. Drug overdose deaths are identified in surveillance systems through ICD-10 codes present on death certificates. ICD-10 coding takes time, but free-text information is available on death certificates prior to ICD-10 coding. The objective of this study was to develop a machine learning method to classify free-text death certificates as drug overdoses to provide faster drug overdose mortality surveillance. Methods Using 2017-2018 Kentucky death certificate data, free-text fields were tokenized and features were created from these tokens using natural language processing (NLP). Word, bigram, and trigram features were created as well as features indicating the part-of-speech of each word. These features were then used to train machine learning classifiers on 2017 data. The resulting models were tested on 2018 Kentucky data and compared to a simple rule-based classification approach. Documented code for this method is available for reuse and extensions: https://github.com/pjward5656/dcnlp. Results The top scoring machine learning model achieved 0.96 positive predictive value (PPV) and 0.98 sensitivity for an F-score of 0.97 in identification of fatal drug overdoses on test data. This machine learning model achieved significantly higher performance for sensitivity (p<0.001) than the rule-based approach. Additional feature engineering may improve the model's prediction. This model can be deployed on death certificates as soon as the freetext is available, eliminating the time needed to code the death certificates. Conclusion Machine learning using natural language processing is a relatively new approach in the context of surveillance of health conditions. This method presents an accessible application of machine learning that improves the timeliness of drug overdose mortality surveillance. As such, it can be employed to inform public health responses to the drug overdose epidemic in near-real time as opposed to several weeks following events.
UR - http://www.scopus.com/inward/record.url?scp=85073447069&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85073447069&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0223318
DO - 10.1371/journal.pone.0223318
M3 - Article
C2 - 31618226
AN - SCOPUS:85073447069
SN - 1932-6203
VL - 14
JO - PLoS ONE
JF - PLoS ONE
IS - 10
M1 - e0223318
ER -