Exploiting semantic patterns over biomedical knowledge graphs for predicting treatment and causative relations

Gokhan Bakal, Preetham Talari, Elijah V. Kakani, Ramakanth Kavuluru

Research output: Contribution to journalArticlepeer-review

44 Scopus citations

Abstract

Background: Identifying new potential treatment options for medical conditions that cause human disease burden is a central task of biomedical research. Since all candidate drugs cannot be tested with animal and clinical trials, in vitro approaches are first attempted to identify promising candidates. Likewise, identifying different causal relations between biomedical entities is also critical to understand biomedical processes. Generally, natural language processing (NLP) and machine learning are used to predict specific relations between any given pair of entities using the distant supervision approach. Objective: To build high accuracy supervised predictive models to predict previously unknown treatment and causative relations between biomedical entities based only on semantic graph pattern features extracted from biomedical knowledge graphs. Methods: We used 7000 treats and 2918 causes hand-curated relations from the UMLS Metathesaurus to train and test our models. Our graph pattern features are extracted from simple paths connecting biomedical entities in the SemMedDB graph (based on the well-known SemMedDB database made available by the U.S. National Library of Medicine). Using these graph patterns connecting biomedical entities as features of logistic regression and decision tree models, we computed mean performance measures (precision, recall, F-score) over 100 distinct 80–20% train-test splits of the datasets. For all experiments, we used a positive:negative class imbalance of 1:10 in the test set to model relatively more realistic scenarios. Results: Our models predict treats and causes relations with high F-scores of 99% and 90% respectively. Logistic regression model coefficients also help us identify highly discriminative patterns that have an intuitive interpretation. We are also able to predict some new plausible relations based on false positives that our models scored highly based on our collaborations with two physician co-authors. Finally, our decision tree models are able to retrieve over 50% of treatment relations from a recently created external dataset. Conclusions: We employed semantic graph patterns connecting pairs of candidate biomedical entities in a knowledge graph as features to predict treatment/causative relations between them. We provide what we believe is the first evidence in direct prediction of biomedical relations based on graph features. Our work complements lexical pattern based approaches in that the graph patterns can be used as additional features for weakly supervised relation prediction.

Original languageEnglish
Pages (from-to)189-199
Number of pages11
JournalJournal of Biomedical Informatics
Volume82
DOIs
StatePublished - Jun 2018

Bibliographical note

Publisher Copyright:
© 2018 Elsevier Inc.

Keywords

  • Information extraction
  • Relation prediction
  • Semantic graph patterns

ASJC Scopus subject areas

  • Health Informatics
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Exploiting semantic patterns over biomedical knowledge graphs for predicting treatment and causative relations'. Together they form a unique fingerprint.

Cite this