Advanced End-to-End Relation Extraction with Deep Neural Networks

Grants and Contracts Details


The fundamental unit of knowledge in information science is typically encoded as a relation connecting a pair of entities via a predicate (or relation type). Examples in biomedicine include gene-disease associations, drug-disease treatment relations, and drug-drug interactions, which are directly linked to identifying new drug targets (enabling precision medicine), finding new indications (e.g., off-label use, repositioning), and surveilling adverse events, respectively. As such, relations are central to understand mechanisms that guide disease etiology, progression, and treatment and thus discovering them is at the heart of the biomedical research enterprise. As new relations are discovered, researchers report them in scientific literature and clinicians record them in clinical notes. Patients are also reporting them (e.g., side affects) on social media. With the exponential growth in these unstructured textual sources, purely manual curation of relations is impractical. In this context, an important area of research in biomedical natural language processing (BioNLP) is to extract this relational knowledge from free text, a process henceforth termed relation extraction (RE), the main theme of this proposal. RE models in general are implemented as pipelines with named entity recognition, entity normalization, and relation classification as three constituent subtasks, whose models are trained separately. There is recent evidence that end-to-end modeling where a single model directly predicts the relations (and corresponding entity spans) can be more effective than the pipeline approach. In this project, we propose novel end-to-end RE approaches leveraging recent advances in neural networks building on preliminary results from our NLM funded RE project (R21LM012274). Besides the novel end-to-end modeling, our neural architectures are tailored to a variety of advanced RE tasks including cross-sentence RE (where relations are expressed across multiple sentences) and n-ary RE (where relations connect more than two entities) using both well-known public datasets and curated new datasets. Our main hypothesis is that our end-to-end joint modeling approaches will yield statistically significant performance gains in RE, across-the-board, when compared with traditional pipelines consisting of separately trained models
Effective start/end date7/1/203/31/24


  • National Library of Medicine: $1,024,053.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.