Deep transfer learning across cancer registries for information extraction from pathology reports

Mohammed Alawad, Shang Gao, John Qiu, Noah Schaefferkoetter, Jacob D. Hinkle, Hong Jun Yoon, J. Blair Christian, Xiao Cheng Wu, Eric B. Durbin, Jong Cheol Jeong, Isaac Hands, David Rust, Georgia Tourassi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Scopus citations

Abstract

Automated text information extraction from cancer pathology reports is an active area of research to support national cancer surveillance. A well-known challenge is how to develop information extraction tools with robust performance across cancer registries. In this study we investigated whether transfer learning (TL) with a convolutional neural network (CNN) can facilitate cross-registry knowledge sharing. Specifically, we performed a series of experiments to determine whether a CNN trained with single-registry data is capable of transferring knowledge to another registry or whether developing a cross-registry knowledge database produces a more effective and generalizable model. Using data from two cancer registries and primary tumor site and topography as the information extraction task of interest, our study showed that TL results in 6.90% and 17.22% improvement of classification macro F-score over the baseline single-registry models. Detailed analysis illustrated that the observed improvement is evident in the low prevalence classes.

Original languageEnglish
Title of host publication2019 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2019 - Proceedings
ISBN (Electronic)9781728108483
DOIs
StatePublished - May 2019
Event2019 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2019 - Chicago, United States
Duration: May 19 2019May 22 2019

Publication series

Name2019 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2019 - Proceedings

Conference

Conference2019 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2019
Country/TerritoryUnited States
CityChicago
Period5/19/195/22/19

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

Funding

This work has been supported in part by the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program established by the U.S. Department of Energy (DOE) and the National Cancer Institute (NCI) of the National Institutes of Health. This work was performed under the auspices of the U.S. Department of Energy by Argonne National Laboratory under Contract DE-AC02-06-CH11357, Lawrence Livermore National Laboratory under Contract DEAC52-07NA27344, Los Alamos National Laboratory under Contract DE-AC5206NA25396, and Oak Ridge National Laboratory under Contract DE-AC05-00OR22725. This work has also been supported by National Cancer Institute under Contract No. HHSN261201800013I and NCI Cancer Center Support Grant (P30CA177558).

FundersFunder number
National Institutes of Health (NIH)
Michigan State University-U.S. Department of Energy (MSU-DOE) Plant Research Laboratory
National Childhood Cancer Registry – National Cancer InstituteP30CA177558, HHSN261201800013I
Argonne National LaboratoryDE-AC02-06-CH11357
Lawrence Livermore National LaboratoryDEAC52-07NA27344
Oak Ridge National LaboratoryDE-AC05-00OR22725
Los Alamos National LaboratoryDE-AC5206NA25396

    Keywords

    • Convolutional neural network
    • Information extraction
    • NLP
    • Pathology reports
    • Transfer learning

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Signal Processing
    • Information Systems and Management
    • Biomedical Engineering
    • Health Informatics
    • Radiology Nuclear Medicine and imaging

    Fingerprint

    Dive into the research topics of 'Deep transfer learning across cancer registries for information extraction from pathology reports'. Together they form a unique fingerprint.

    Cite this