Exploiting semantic patterns over biomedical knowledge graphs for predicting treatment and causative relations

Gokhan Bakal, Preetham Talari, Elijah V. Kakani, Ramakanth Kavuluru

Research output: Contribution to journalArticlepeer-review

51 Scopus citations

Abstract

Background: Identifying new potential treatment options for medical conditions that cause human disease burden is a central task of biomedical research. Since all candidate drugs cannot be tested with animal and clinical trials, in vitro approaches are first attempted to identify promising candidates. Likewise, identifying different causal relations between biomedical entities is also critical to understand biomedical processes. Generally, natural language processing (NLP) and machine learning are used to predict specific relations between any given pair of entities using the distant supervision approach. Objective: To build high accuracy supervised predictive models to predict previously unknown treatment and causative relations between biomedical entities based only on semantic graph pattern features extracted from biomedical knowledge graphs. Methods: We used 7000 treats and 2918 causes hand-curated relations from the UMLS Metathesaurus to train and test our models. Our graph pattern features are extracted from simple paths connecting biomedical entities in the SemMedDB graph (based on the well-known SemMedDB database made available by the U.S. National Library of Medicine). Using these graph patterns connecting biomedical entities as features of logistic regression and decision tree models, we computed mean performance measures (precision, recall, F-score) over 100 distinct 80–20% train-test splits of the datasets. For all experiments, we used a positive:negative class imbalance of 1:10 in the test set to model relatively more realistic scenarios. Results: Our models predict treats and causes relations with high F-scores of 99% and 90% respectively. Logistic regression model coefficients also help us identify highly discriminative patterns that have an intuitive interpretation. We are also able to predict some new plausible relations based on false positives that our models scored highly based on our collaborations with two physician co-authors. Finally, our decision tree models are able to retrieve over 50% of treatment relations from a recently created external dataset. Conclusions: We employed semantic graph patterns connecting pairs of candidate biomedical entities in a knowledge graph as features to predict treatment/causative relations between them. We provide what we believe is the first evidence in direct prediction of biomedical relations based on graph features. Our work complements lexical pattern based approaches in that the graph patterns can be used as additional features for weakly supervised relation prediction.

Original languageEnglish
Pages (from-to)189-199
Number of pages11
JournalJournal of Biomedical Informatics
Volume82
DOIs
StatePublished - Jun 2018

Bibliographical note

Publisher Copyright:
© 2018 Elsevier Inc.

Funding

We thank reviewers for their constructive criticism that helped improved the quality of this manuscript. We are grateful for the support of the U.S. National Library of Medicine through NIH grant R21LM012274 and also thankful for partial support offered by the U.S. National Center for Advancing Translational Sciences via grant UL1TR001998 . The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. We also acknowledge the Ministry of National Education, Republic of Turkey , for providing financial support to Gokhan Bakal with full scholarship for his doctoral studies.

FundersFunder number
National Institutes of Health (NIH)
National Childhood Cancer Registry – National Cancer InstituteP30CA177558
National Childhood Cancer Registry – National Cancer Institute
U.S. National Library of MedicineR21LM012274
U.S. National Library of Medicine
National Center for Advancing Translational Sciences (NCATS)UL1TR001998
National Center for Advancing Translational Sciences (NCATS)
Milli Eğitim Bakanliği

    Keywords

    • Information extraction
    • Relation prediction
    • Semantic graph patterns

    ASJC Scopus subject areas

    • Health Informatics
    • Computer Science Applications

    Fingerprint

    Dive into the research topics of 'Exploiting semantic patterns over biomedical knowledge graphs for predicting treatment and causative relations'. Together they form a unique fingerprint.

    Cite this