Distant supervision for treatment relation extraction by leveraging MeSH subheadings

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

The growing body of knowledge in biomedicine is too vast for human consumption. Hence there is a need for automated systems able to navigate and distill the emerging wealth of information. One fundamental task to that end is relation extraction, whereby linguistic expressions of semantic relationships between biomedical entities are recognized and extracted. In this study, we propose a novel distant supervision approach for relation extraction of binary treatment relationships such that high quality positive/negative training examples are generated from PubMed abstracts by leveraging associated MeSH subheadings. The quality of generated examples is assessed based on the quality of supervised models they induce; that is, the mean performance of trained models (derived via bootstrapped ensembling) on a gold standard test set is used as a proxy for data quality. We show that our approach is preferable to traditional distant supervision for treatment relations and is closer to human crowd annotations in terms of annotation quality. For treatment relations, our generated training data performs at 81.38%, compared to traditional distant supervision at 64.33% and crowd-sourced annotations at 90.57% on the model-wide PR-AUC metric. We also demonstrate that examples generated using our method can be used to augment crowd-sourced datasets. Augmented models improve over non-augmented models by more than two absolute points on the more established F1 metric. We lastly demonstrate that performance can be further improved by implementing a classification loss that is resistant to label noise.

Original languageEnglish
Pages (from-to)18-26
Number of pages9
JournalArtificial Intelligence in Medicine
Volume98
DOIs
StatePublished - Jul 2019

Bibliographical note

Publisher Copyright:
© 2019 Elsevier B.V.

Funding

We are grateful to the U.S. National Library of Medicine for offering the primary support for this work through grant R21LM012274 . We are also thankful for additional support by the National Center for Advancing Translational Sciences through grant UL1TR001998 . The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.

FundersFunder number
U.S. National Library of MedicineR21LM012274
U.S. National Library of Medicine
National Center for Advancing Translational Sciences (NCATS)UL1TR001998
National Center for Advancing Translational Sciences (NCATS)
Nvidia

    Keywords

    • Distant supervision
    • MeSH subheadings
    • Medical treatment relation
    • Relation extraction

    ASJC Scopus subject areas

    • Medicine (miscellaneous)
    • Artificial Intelligence

    Fingerprint

    Dive into the research topics of 'Distant supervision for treatment relation extraction by leveraging MeSH subheadings'. Together they form a unique fingerprint.

    Cite this