TY - JOUR
T1 - Data and systems for medication-related text classification and concept normalization from Twitter
T2 - Insights from the Social Media Mining for Health (SMM4H)-2017 shared task
AU - Sarker, Abeed
AU - Belousov, Maksim
AU - Friedrichs, Jasper
AU - Hakala, Kai
AU - Kiritchenko, Svetlana
AU - Mehryary, Farrokh
AU - Han, Sifei
AU - Tran, Tung
AU - Rios, Anthony
AU - Kavuluru, Ramakanth
AU - De Bruijn, Berry
AU - Ginter, Filip
AU - Mahata, Debanjan
AU - Mohammad, Saif M.
AU - Nenadic, Goran
AU - Gonzalez-Hernandez, Graciela
N1 - Publisher Copyright:
© © The Author(s) 2018. Published by Oxford University Press on behalf of the American Medical Informatics Association.
PY - 2018/10/1
Y1 - 2018/10/1
N2 - Objective We executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable the community-driven development and large-scale evaluation of automatic text processing methods for the classification and normalization of health-related text from social media. An additional objective was to publicly release manually annotated data. Materials and Methods We organized 3 independent subtasks: Automatic classification of self-reports of 1) adverse drug reactions (ADRs) and 2) medication consumption, from medication-mentioning tweets, and 3) normalization of ADR expressions. Training data consisted of 15 717 annotated tweets for (1), 10 260 for (2), and 6650 ADR phrases and identifiers for (3); and exhibited typical properties of social-media-based health-related texts. Systems were evaluated using 9961, 7513, and 2500 instances for the 3 subtasks, respectively. We evaluated performances of classes of methods and ensembles of system combinations following the shared tasks. Results Among 55 system runs, the best system scores for the 3 subtasks were 0.435 (ADR class F 1-score) for subtask-1, 0.693 (micro-Averaged F 1-score over two classes) for subtask-2, and 88.5% (accuracy) for subtask-3. Ensembles of system combinations obtained best scores of 0.476, 0.702, and 88.7%, outperforming individual systems. Discussion Among individual systems, support vector machines and convolutional neural networks showed high performance. Performance gains achieved by ensembles of system combinations suggest that such strategies may be suitable for operational systems relying on difficult text classification tasks (eg, subtask-1). Conclusions Data imbalance and lack of context remain challenges for natural language processing of social media text. Annotated data from the shared task have been made available as reference standards for future studies (http://dx.doi.org/10.17632/rxwfb3tysd.1).
AB - Objective We executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable the community-driven development and large-scale evaluation of automatic text processing methods for the classification and normalization of health-related text from social media. An additional objective was to publicly release manually annotated data. Materials and Methods We organized 3 independent subtasks: Automatic classification of self-reports of 1) adverse drug reactions (ADRs) and 2) medication consumption, from medication-mentioning tweets, and 3) normalization of ADR expressions. Training data consisted of 15 717 annotated tweets for (1), 10 260 for (2), and 6650 ADR phrases and identifiers for (3); and exhibited typical properties of social-media-based health-related texts. Systems were evaluated using 9961, 7513, and 2500 instances for the 3 subtasks, respectively. We evaluated performances of classes of methods and ensembles of system combinations following the shared tasks. Results Among 55 system runs, the best system scores for the 3 subtasks were 0.435 (ADR class F 1-score) for subtask-1, 0.693 (micro-Averaged F 1-score over two classes) for subtask-2, and 88.5% (accuracy) for subtask-3. Ensembles of system combinations obtained best scores of 0.476, 0.702, and 88.7%, outperforming individual systems. Discussion Among individual systems, support vector machines and convolutional neural networks showed high performance. Performance gains achieved by ensembles of system combinations suggest that such strategies may be suitable for operational systems relying on difficult text classification tasks (eg, subtask-1). Conclusions Data imbalance and lack of context remain challenges for natural language processing of social media text. Annotated data from the shared task have been made available as reference standards for future studies (http://dx.doi.org/10.17632/rxwfb3tysd.1).
KW - machine learning
KW - natural language processing
KW - pharmacovigilance
KW - social media
KW - text mining
UR - http://www.scopus.com/inward/record.url?scp=85054889806&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85054889806&partnerID=8YFLogxK
U2 - 10.1093/jamia/ocy114
DO - 10.1093/jamia/ocy114
M3 - Article
C2 - 30272184
AN - SCOPUS:85054889806
SN - 1067-5027
VL - 25
SP - 1274
EP - 1283
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 10
ER -