A Lexical Approach to Identifying Subtype Inconsistencies in Biomedical Terminologies

Rashmie Abeysinghe, Fengbo Zheng, Eugene W. Hinderer, Hunter N.B. Moseley, Licong Cui

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations


We introduce a lexical-based inference approach for identifying subtype (or is{-}a relation) inconsistencies in biomedical terminologies. Given a terminology, we first represent the name of each concept in the terminology as a sequence of words. We then generate hierarchically-linked and-unlinked pairs of concepts, such that the two concepts in a pair have the same number of words, and contain at least one word in common and a fixed number n of different words (n = 1,2,3,4,5). The linked and unlinked concept-pairs further infer corresponding linked and unlinked term-pairs, respectively. If a linked concept-pair and an unlinked concept-pair infer the same term-pair, we consider this as a potential subtype inconsistency, which may indicate a missing subtype relation or an incorrect subtype relation. We applied this approach to Gene Ontology (GO), National Cancer Institute thesaurus (NCIt) and SNOMED CT. A total of 4,841 potential subtype inconsistencies were found in GO, 2,677 in NCIt, and 53,782 in SNOMED CT. Domain experts evaluated a random sample of 211 potential inconsistencies in GO, and verified that 124 of them are valid (mathrm {i}.mathrm {e}., a precision of 58.77% for detecting subtype inconsistencies in GO). We also performed a preliminary study on the extent to which external knowledge in the Unified Medical Language System (UMLS) can provide supporting evidence for validating the detected potential inconsistencies: 0.54% (=26/4841) for GO, 11.43% (=306/2677) for NCIt, and 3.61% (=1940/53782) for SNOMED CT. Results indicate that our lexical-based inference approach is a promising way to identify subtype inconsistencies and facilitates the quality improvement of biomedical terminologies.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
EditorsHarald Schmidt, David Griol, Haiying Wang, Jan Baumbach, Huiru Zheng, Zoraida Callejas, Xiaohua Hu, Julie Dickerson, Le Zhang
Number of pages8
ISBN (Electronic)9781538654880
StatePublished - Jan 21 2019
Event2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018 - Madrid, Spain
Duration: Dec 3 2018Dec 6 2018

Publication series

NameProceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018


Conference2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018

Bibliographical note

Publisher Copyright:
© 2018 IEEE.


  • Gene Ontology
  • incorrect subtype relations
  • missing subtype relations
  • national Cancer Institute thesaurus
  • subtype inconsistencies
  • terminology quality assurance
  • unified Med-ical Language System

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics


Dive into the research topics of 'A Lexical Approach to Identifying Subtype Inconsistencies in Biomedical Terminologies'. Together they form a unique fingerprint.

Cite this