Abstract
Quality assurance of biomedical terminologies such as the National Cancer Institute (NCI) Thesaurus is an essential part of the terminology management lifecycle. We investigate a structural-lexical approach based on non-lattice subgraphs to automatically identify missing hierarchical relations and missing concepts in the NCI Thesaurus. We mine six structural-lexical patterns exhibiting in non-lattice subgraphs: containment, union, intersection, union-intersection, inference-contradiction, and inference union. Each pattern indicates a potential specific type of error and suggests a potential type of remediation. We found 809 non-lattice subgraphs with these patterns in the NCI Thesaurus (version 16.12d). Domain experts evaluated a random sample of 50 small non-lattice subgraphs, of which 33 were confirmed to contain errors and make correct suggestions (33/50 = 66%). Of the 25 evaluated subgraphs revealing multiple patterns, 22 were verified correct (22/25 = 88%). This shows the effectiveness of our structurallexical-pattern-based approach in detecting errors and suggesting remediations in the NCI Thesaurus.
| Original language | English |
|---|---|
| Pages (from-to) | 364-373 |
| Number of pages | 10 |
| Journal | AMIA ... Annual Symposium proceedings. AMIA Symposium |
| Volume | 2017 |
| State | Published - 2017 |
Funding
| Funders | Funder number |
|---|---|
| National Center for Advancing Translational Sciences (NCATS) | UL1TR001998 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
ASJC Scopus subject areas
- General Medicine
Fingerprint
Dive into the research topics of 'Quality Assurance of NCI Thesaurus by Mining Structural-Lexical Patterns'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver