Quality Assurance of NCI Thesaurus by Mining Structural-Lexical Patterns

Rashmie Abeysinghe, Michael A. Brooks, Jeffery Talbert, Cui Licong

Research output: Contribution to journalArticlepeer-review

18 Scopus citations

Abstract

Quality assurance of biomedical terminologies such as the National Cancer Institute (NCI) Thesaurus is an essential part of the terminology management lifecycle. We investigate a structural-lexical approach based on non-lattice subgraphs to automatically identify missing hierarchical relations and missing concepts in the NCI Thesaurus. We mine six structural-lexical patterns exhibiting in non-lattice subgraphs: containment, union, intersection, union-intersection, inference-contradiction, and inference union. Each pattern indicates a potential specific type of error and suggests a potential type of remediation. We found 809 non-lattice subgraphs with these patterns in the NCI Thesaurus (version 16.12d). Domain experts evaluated a random sample of 50 small non-lattice subgraphs, of which 33 were confirmed to contain errors and make correct suggestions (33/50 = 66%). Of the 25 evaluated subgraphs revealing multiple patterns, 22 were verified correct (22/25 = 88%). This shows the effectiveness of our structurallexical-pattern-based approach in detecting errors and suggesting remediations in the NCI Thesaurus.

Original languageEnglish
Pages (from-to)364-373
Number of pages10
JournalAMIA ... Annual Symposium proceedings. AMIA Symposium
Volume2017
StatePublished - 2017

ASJC Scopus subject areas

  • General Medicine

Fingerprint

Dive into the research topics of 'Quality Assurance of NCI Thesaurus by Mining Structural-Lexical Patterns'. Together they form a unique fingerprint.

Cite this