Auditing subtype inconsistencies among gene ontology concepts

Rashmie Abeysinghe, Eugene W. Hinderer, Hunter N.B. Moseley, Licong Cui

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

Gene Ontology (GO) provides a controlled vocabulary for describing genes and related gene products. Quality assurance of Gene ontology (GO) is a vital aspect of the terminology management lifecycle. In this paper, we introduce a lexical-based inference approach to detecting subtype (or isa) inconsistencies among GO terms (i.e., biological concepts). We first model the name of each concept as a set of words. Then, we generate hierarchically linked and unlinked pairs of concepts (A, B), where A and B have the same number of words, and contain common words as well as a single different word. Each linked concept-pair infers a linked term-pair, and each unlinked concept-pair infers an unlinked term-pair. A term-pair appearing as both linked and unlinked is considered a potential inconsistency, which may represent a subtype inconsistency between the original linked and unlinked concept-pair. Applying this approach to the 03/28/2017 release of GO, a total of 3,715 potential subtype inconsistencies were obtained. Evaluation of a random sample of potential inconsistencies revealed two types of potential errors: missing subtype relations and incorrect subtype relations in GO, and achieved an accuracy of 56.33% for detecting such errors. This indicates that this lexical-based inference approach using the set-of-words model is a promising way to facilitate quality improvement of GO.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
EditorsIllhoi Yoo, Jane Huiru Zheng, Yang Gong, Xiaohua Tony Hu, Chi-Ren Shyu, Yana Bromberg, Jean Gao, Dmitry Korkin
Pages1242-1245
Number of pages4
ISBN (Electronic)9781509030491
DOIs
StatePublished - Dec 15 2017
Event2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017 - Kansas City, United States
Duration: Nov 13 2017Nov 16 2017

Publication series

NameProceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
Volume2017-January

Conference

Conference2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
Country/TerritoryUnited States
CityKansas City
Period11/13/1711/16/17

Bibliographical note

Publisher Copyright:
© 2017 IEEE.

Funding

This work was supported by the National Science Foundation through grants 1657306 and 1252893, and by the National Institutes of Health through grant UL1TR001998-01. This work was supported by the National Science Foundation through grants 1657306 and 1252893, and by the National Institutes of Health through grant UL1TR001998-01. Correspondence: [email protected]

FundersFunder number
National Science Foundation (NSF)1252893, 1657306
National Institutes of Health (NIH)UL1TR001998-01
National Science Foundation (NSF)

    ASJC Scopus subject areas

    • Biomedical Engineering
    • Health Informatics

    Fingerprint

    Dive into the research topics of 'Auditing subtype inconsistencies among gene ontology concepts'. Together they form a unique fingerprint.

    Cite this