From Syntactic Relations to Semantic Predications: Porting Open Information Extraction to Biomedicine

Grants and Contracts Details


Extracting meaningful relations from natural language narratives is an essential task in hypotheses generation, knowledge discovery, and information retrieval in biomedicine. Relation extraction has long been a challenging problem in biomedical and clinical domains owing to the complexity and idiosyncrasies of language constructs used in expressing entities and the relations between them. Most relation extraction efforts in published literature have a very narrow focus. For example, several researchers extract gene-protein or gene-gene interactions; in the clinical domain, recent results are focused on drug-drug and drug-disease interactions mentioned in textual narratives from EMRs. The only effort that extracts a broad set of relations is the SemRep program being developed at the National Library of Medicine (NLM). However, although it provides a precision of 75%, it's recall is 20% for interesting relation types. We propose a novel high accuracy supervised relation extraction framework that leverages the open information extraction paradigm and exploits the database of high precision relations extracted using SemRep. Domain independent evaluation will be conducted based on a gold standard dataset built by researchers at the NLM for evaluating SemRep. Application specific evaluation will be conducted in the context of information retrieval using the relations extracted for effective document and passage retrieval tasks in the TREC Genomics challenges. A second application oriented evaluation will be conducted in the context of literature based knowledge discovery to rediscover well known discoveries based on implicit indirect connections among the relations extracted from the literature published before the original discoveries were made.
Effective start/end date6/1/165/31/19


  • National Library of Medicine: $373,960.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.