Abstract
The growing body of knowledge in biomedicine is too vast for human consumption. Hence there is a need for automated systems able to navigate and distill the emerging wealth of information. One fundamental task to that end is relation extraction, whereby linguistic expressions of semantic relationships between biomedical entities are recognized and extracted. In this study, we propose a novel distant supervision approach for relation extraction of binary treatment relationships such that high quality positive/negative training examples are generated from PubMed abstracts by leveraging associated MeSH subheadings. The quality of generated examples is assessed based on the quality of supervised models they induce; that is, the mean performance of trained models (derived via bootstrapped ensembling) on a gold standard test set is used as a proxy for data quality. We show that our approach is preferable to traditional distant supervision for treatment relations and is closer to human crowd annotations in terms of annotation quality. For treatment relations, our generated training data performs at 81.38%, compared to traditional distant supervision at 64.33% and crowd-sourced annotations at 90.57% on the model-wide PR-AUC metric. We also demonstrate that examples generated using our method can be used to augment crowd-sourced datasets. Augmented models improve over non-augmented models by more than two absolute points on the more established F1 metric. We lastly demonstrate that performance can be further improved by implementing a classification loss that is resistant to label noise.
| Original language | English |
|---|---|
| Pages (from-to) | 18-26 |
| Number of pages | 9 |
| Journal | Artificial Intelligence in Medicine |
| Volume | 98 |
| DOIs | |
| State | Published - Jul 2019 |
Bibliographical note
Publisher Copyright:© 2019 Elsevier B.V.
Funding
We are grateful to the U.S. National Library of Medicine for offering the primary support for this work through grant R21LM012274 . We are also thankful for additional support by the National Center for Advancing Translational Sciences through grant UL1TR001998 . The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.
| Funders | Funder number |
|---|---|
| U.S. National Library of Medicine | R21LM012274 |
| U.S. National Library of Medicine | |
| National Center for Advancing Translational Sciences (NCATS) | UL1TR001998 |
| National Center for Advancing Translational Sciences (NCATS) | |
| Nvidia |
Keywords
- Distant supervision
- MeSH subheadings
- Medical treatment relation
- Relation extraction
ASJC Scopus subject areas
- Medicine (miscellaneous)
- Artificial Intelligence