Skip to main navigation Skip to search Skip to main content

Quality Assurance of NCI Thesaurus by Mining Structural-Lexical Patterns

  • Rashmie Abeysinghe
  • , Michael A. Brooks
  • , Jeffery Talbert
  • , Cui Licong

Research output: Contribution to journalArticlepeer-review

20 Scopus citations

Abstract

Quality assurance of biomedical terminologies such as the National Cancer Institute (NCI) Thesaurus is an essential part of the terminology management lifecycle. We investigate a structural-lexical approach based on non-lattice subgraphs to automatically identify missing hierarchical relations and missing concepts in the NCI Thesaurus. We mine six structural-lexical patterns exhibiting in non-lattice subgraphs: containment, union, intersection, union-intersection, inference-contradiction, and inference union. Each pattern indicates a potential specific type of error and suggests a potential type of remediation. We found 809 non-lattice subgraphs with these patterns in the NCI Thesaurus (version 16.12d). Domain experts evaluated a random sample of 50 small non-lattice subgraphs, of which 33 were confirmed to contain errors and make correct suggestions (33/50 = 66%). Of the 25 evaluated subgraphs revealing multiple patterns, 22 were verified correct (22/25 = 88%). This shows the effectiveness of our structurallexical-pattern-based approach in detecting errors and suggesting remediations in the NCI Thesaurus.

Original languageEnglish
Pages (from-to)364-373
Number of pages10
JournalAMIA ... Annual Symposium proceedings. AMIA Symposium
Volume2017
StatePublished - 2017

Funding

FundersFunder number
National Center for Advancing Translational Sciences (NCATS)UL1TR001998

    UN SDGs

    This output contributes to the following UN Sustainable Development Goals (SDGs)

    1. SDG 3 - Good Health and Well-being
      SDG 3 Good Health and Well-being

    ASJC Scopus subject areas

    • General Medicine

    Fingerprint

    Dive into the research topics of 'Quality Assurance of NCI Thesaurus by Mining Structural-Lexical Patterns'. Together they form a unique fingerprint.

    Cite this