TY - JOUR
T1 - Enhancing the quality of hierarchic relations in the national cancer institute thesaurus to enable faceted query of cancer registry data
AU - Cui, Licong
AU - Abeysinghe, Rashmie
AU - Zheng, Fengbo
AU - Tao, Shiqiang
AU - Zeng, Ningzhou
AU - Hands, Isaac
AU - Durbin, Eric B.
AU - Whiteman, Lori
AU - Remennik, Lyubov
AU - Sioutos, Nicholas
AU - Zhang, Guo Qiang
N1 - Publisher Copyright:
© 2020 by American Society of Clinical Oncology Licensed under the Creative Commons Attribution 4.0 License
PY - 2020
Y1 - 2020
N2 - PURPOSE To audit and improve the completeness of the hierarchic (or is-a) relations of the National Cancer Institute (NCI) Thesaurus to support its role as a faceted system for querying cancer registry data. METHODS We performed quality auditing of the 19.01d version of the NCI Thesaurus. Our hybrid auditing method consisted of three main steps: computing nonlattice subgraphs, constructing lexical features for concepts in each subgraph, and performing subsumption reasoning with each subgraph to automatically suggest potentially missing is-a relations. RESULTS A total of 9,512 nonlattice subgraphs were obtained. Our method identified 925 potentially missing is-a relations in 441 nonlattice subgraphs; 72 of 176 reviewed samples were confirmed as valid missing is-a relations and have been incorporated in the newer versions of the NCI Thesaurus. CONCLUSION Autosuggested changes resulting from our auditing method can improve the structural organization of the NCI Thesaurus in supporting its new role for faceted query.
AB - PURPOSE To audit and improve the completeness of the hierarchic (or is-a) relations of the National Cancer Institute (NCI) Thesaurus to support its role as a faceted system for querying cancer registry data. METHODS We performed quality auditing of the 19.01d version of the NCI Thesaurus. Our hybrid auditing method consisted of three main steps: computing nonlattice subgraphs, constructing lexical features for concepts in each subgraph, and performing subsumption reasoning with each subgraph to automatically suggest potentially missing is-a relations. RESULTS A total of 9,512 nonlattice subgraphs were obtained. Our method identified 925 potentially missing is-a relations in 441 nonlattice subgraphs; 72 of 176 reviewed samples were confirmed as valid missing is-a relations and have been incorporated in the newer versions of the NCI Thesaurus. CONCLUSION Autosuggested changes resulting from our auditing method can improve the structural organization of the NCI Thesaurus in supporting its new role for faceted query.
UR - http://www.scopus.com/inward/record.url?scp=85084409372&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084409372&partnerID=8YFLogxK
U2 - 10.1200/CCI.19.00124
DO - 10.1200/CCI.19.00124
M3 - Article
C2 - 32374632
AN - SCOPUS:85084409372
SN - 2473-4276
VL - 4
SP - 392
EP - 398
JO - JCO clinical cancer informatics
JF - JCO clinical cancer informatics
ER -