Abstract
Gene Ontology (GO) provides a controlled vocabulary for describing genes and related gene products. Quality assurance of Gene ontology (GO) is a vital aspect of the terminology management lifecycle. In this paper, we introduce a lexical-based inference approach to detecting subtype (or isa) inconsistencies among GO terms (i.e., biological concepts). We first model the name of each concept as a set of words. Then, we generate hierarchically linked and unlinked pairs of concepts (A, B), where A and B have the same number of words, and contain common words as well as a single different word. Each linked concept-pair infers a linked term-pair, and each unlinked concept-pair infers an unlinked term-pair. A term-pair appearing as both linked and unlinked is considered a potential inconsistency, which may represent a subtype inconsistency between the original linked and unlinked concept-pair. Applying this approach to the 03/28/2017 release of GO, a total of 3,715 potential subtype inconsistencies were obtained. Evaluation of a random sample of potential inconsistencies revealed two types of potential errors: missing subtype relations and incorrect subtype relations in GO, and achieved an accuracy of 56.33% for detecting such errors. This indicates that this lexical-based inference approach using the set-of-words model is a promising way to facilitate quality improvement of GO.
Original language | English |
---|---|
Title of host publication | Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017 |
Editors | Illhoi Yoo, Jane Huiru Zheng, Yang Gong, Xiaohua Tony Hu, Chi-Ren Shyu, Yana Bromberg, Jean Gao, Dmitry Korkin |
Pages | 1242-1245 |
Number of pages | 4 |
ISBN (Electronic) | 9781509030491 |
DOIs | |
State | Published - Dec 15 2017 |
Event | 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017 - Kansas City, United States Duration: Nov 13 2017 → Nov 16 2017 |
Publication series
Name | Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017 |
---|---|
Volume | 2017-January |
Conference
Conference | 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017 |
---|---|
Country/Territory | United States |
City | Kansas City |
Period | 11/13/17 → 11/16/17 |
Bibliographical note
Publisher Copyright:© 2017 IEEE.
Funding
This work was supported by the National Science Foundation through grants 1657306 and 1252893, and by the National Institutes of Health through grant UL1TR001998-01. This work was supported by the National Science Foundation through grants 1657306 and 1252893, and by the National Institutes of Health through grant UL1TR001998-01. Correspondence: [email protected]
Funders | Funder number |
---|---|
National Science Foundation (NSF) | 1252893, 1657306 |
National Institutes of Health (NIH) | UL1TR001998-01 |
National Science Foundation (NSF) |
ASJC Scopus subject areas
- Biomedical Engineering
- Health Informatics