TY - JOUR
T1 - Assigning ICD-O-3 Codes to Pathology Reports using Neural Multi-Task Training with Hierarchical Regularization
AU - Rios, Anthony
AU - Durbin, Eric B
AU - Hands, Isaac
AU - Kavuluru, Ramakanth
PY - 2021/8
Y1 - 2021/8
N2 - Tracking population-level cancer information is essential for researchers, clinicians, policymakers, and the public. Unfortunately, much of the information is stored as unstructured data in pathology reports. Thus, too process the information, we require either automated extraction techniques or manual curation. Moreover, many of the cancer-related concepts appear infrequently in real-world training datasets. Automated extraction is difficult because of the limited data. This study introduces a novel technique that incorporates structured expert knowledge to improve histology and topography code classification models. Using pathology reports collected from the Kentucky Cancer Registry, we introduce a novel multi-task training approach with hierarchical regularization that incorporates structured information about the International Classification of Diseases for Oncology, 3rd Edition classes to improve predictive performance. Overall, we find that our method improves both micro and macro F1. For macro F1, we achieve up to a 6% absolute improvement for topography codes and up to 4% absolute improvement for histology codes.
AB - Tracking population-level cancer information is essential for researchers, clinicians, policymakers, and the public. Unfortunately, much of the information is stored as unstructured data in pathology reports. Thus, too process the information, we require either automated extraction techniques or manual curation. Moreover, many of the cancer-related concepts appear infrequently in real-world training datasets. Automated extraction is difficult because of the limited data. This study introduces a novel technique that incorporates structured expert knowledge to improve histology and topography code classification models. Using pathology reports collected from the Kentucky Cancer Registry, we introduce a novel multi-task training approach with hierarchical regularization that incorporates structured information about the International Classification of Diseases for Oncology, 3rd Edition classes to improve predictive performance. Overall, we find that our method improves both micro and macro F1. For macro F1, we achieve up to a 6% absolute improvement for topography codes and up to 4% absolute improvement for histology codes.
U2 - 10.1145/3459930.3469541
DO - 10.1145/3459930.3469541
M3 - Article
C2 - 34541582
VL - 2021
JO - ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine
JF - ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine
ER -