A Lexical Approach to Identifying Subtype Inconsistencies in Biomedical Terminologies

Rashmie Abeysinghe, Fengbo Zheng, Eugene W. Hinderer, Hunter N.B. Moseley, Licong Cui

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

We introduce a lexical-based inference approach for identifying subtype (or is{-}a relation) inconsistencies in biomedical terminologies. Given a terminology, we first represent the name of each concept in the terminology as a sequence of words. We then generate hierarchically-linked and-unlinked pairs of concepts, such that the two concepts in a pair have the same number of words, and contain at least one word in common and a fixed number n of different words (n = 1,2,3,4,5). The linked and unlinked concept-pairs further infer corresponding linked and unlinked term-pairs, respectively. If a linked concept-pair and an unlinked concept-pair infer the same term-pair, we consider this as a potential subtype inconsistency, which may indicate a missing subtype relation or an incorrect subtype relation. We applied this approach to Gene Ontology (GO), National Cancer Institute thesaurus (NCIt) and SNOMED CT. A total of 4,841 potential subtype inconsistencies were found in GO, 2,677 in NCIt, and 53,782 in SNOMED CT. Domain experts evaluated a random sample of 211 potential inconsistencies in GO, and verified that 124 of them are valid (mathrm {i}.mathrm {e}., a precision of 58.77% for detecting subtype inconsistencies in GO). We also performed a preliminary study on the extent to which external knowledge in the Unified Medical Language System (UMLS) can provide supporting evidence for validating the detected potential inconsistencies: 0.54% (=26/4841) for GO, 11.43% (=306/2677) for NCIt, and 3.61% (=1940/53782) for SNOMED CT. Results indicate that our lexical-based inference approach is a promising way to identify subtype inconsistencies and facilitates the quality improvement of biomedical terminologies.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
EditorsHarald Schmidt, David Griol, Haiying Wang, Jan Baumbach, Huiru Zheng, Zoraida Callejas, Xiaohua Hu, Julie Dickerson, Le Zhang
Pages1982-1989
Number of pages8
ISBN (Electronic)9781538654880
DOIs
StatePublished - Jan 21 2019
Event2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018 - Madrid, Spain
Duration: Dec 3 2018Dec 6 2018

Publication series

NameProceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018

Conference

Conference2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
Country/TerritorySpain
CityMadrid
Period12/3/1812/6/18

Bibliographical note

Publisher Copyright:
© 2018 IEEE.

Keywords

  • Gene Ontology
  • incorrect subtype relations
  • missing subtype relations
  • national Cancer Institute thesaurus
  • sNOMED CT
  • subtype inconsistencies
  • terminology quality assurance
  • unified Med-ical Language System

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics

Fingerprint

Dive into the research topics of 'A Lexical Approach to Identifying Subtype Inconsistencies in Biomedical Terminologies'. Together they form a unique fingerprint.

Cite this