Grants and Contracts Details
Description
An ontology is a formal representation of concepts in a domain of discourse and the relationships between those concepts. Ontologies play a more and more important role in knowledge representation, data integration, data management, natural language processing, information retrieval, and decision support, especially for various health information systems and biomedical research. Although ontology is well- founded and well-studied in the theory of Description Logics, ontologies in practice suffer from various quality issues and lack of general and scalable methods for quality evaluation. In particular, the logic property completeness, is not often guaranteed for large biomedical ontologies. Here completeness (or comprehensive coverage) relates to coverage of concepts and related terms, gaps in hierarchal and semantic relationships, and incomplete definition of concepts. Such quality issues in ontologies, if not addressed, can affect the quality of all downstream information systems relying on them as knowledge sources. Meanwhile, finding such quality issues in large ontologies is like finding needles in a haystack and poses computational challenges.
This project focuses on investigating missing hierarchical relations and concepts to enhance the completeness of large biomedical ontologies. Most existing approaches to detecting incompleteness of biomedical ontologies either purely rely on extrinsic knowledge sources neglecting sophisticated intrinsic knowledge; or merely indicate the potential areas of quality issues based on intrinsic knowledge and arduous manual review by domain experts are then required to examine the potential issues and find solutions. In this project, we propose to develop scalable, effective methods to investigate the completeness of large ontologies in practice, and thus enhance their qualities. Specific aims of this project are as follows. Objective 1: Detecting missing hierarchical relations and concepts based on intrinsic knowledge. We propose to detect incompleteness using Formal Concept Analysis (FCA) based on the logical definitions of ontology concepts. The detected incompleteness will include missing hierarchical relations and concepts. Objective 2: Generating proper terms for missing concepts. We propose to use long short-term memory (LSTM), a simple recurrent neural network, to predict names or terms for missing concepts. Objective 3: Evaluating the effectiveness of incompleteness detection and term generation. We will perform rigorous evaluation in two ways. The first way is to use two external sources: ontologies or biomedical literature. The second way is to ask for domain experts to evaluate randomly selected samples.
Status | Finished |
---|---|
Effective start/end date | 9/1/18 → 12/31/18 |
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.