An Intelligent Search & Retrieval System (IRIS) and Clinical and Research Repository for Decision Support Based on Machine Learning and Joint Kernel-based Supervised Hashing

David J. Foran, Wenjin Chen, Tahsin Kurc, Rajarshi Gupta, Jakub Roman Kaczmarzyk, Luke Austin Torre-Healy, Erich Bremer, Samuel Ajjarapu, Nhan Do, Gerald Harris, Antoinette Stroup, Eric Durbin, Joel H. Saltz

Research output: Contribution to journalArticlepeer-review

Abstract

Large-scale, multi-site collaboration is becoming indispensable for a wide range of research and clinical activities in oncology. To facilitate the next generation of advances in cancer biology, precision oncology and the population sciences it will be necessary to develop and implement data management and analytic tools that empower investigators to reliably and objectively detect, characterize and chronicle the phenotypic and genomic changes that occur during the transformation from the benign to cancerous state and throughout the course of disease progression. To facilitate these efforts it is incumbent upon the informatics community to establish the workflows and architectures that automate the aggregation and organization of a growing range and number of clinical data types and modalities ranging from new molecular and laboratory tests to sophisticated diagnostic imaging studies. In an attempt to meet those challenges, leading health care centers across the country are making steep investments to establish enterprise-wide, data warehouses. A significant limitation of many data warehouses, however, is that they are designed to support only alphanumeric information. In contrast to those traditional designs, the system that we have developed supports automated collection and mining of multimodal data including genomics, digital pathology and radiology images. In this paper, our team describes the design, development and implementation of a multi-modal, Clinical & Research Data Warehouse (CRDW) that is tightly integrated with a suite of computational and machine-learning tools to provide actionable insight into the underlying characteristics of the tumor environment that would not be revealed using standard methods and tools. The System features a flexible Extract, Transform and Load (ETL) interface that enables it to adapt to aggregate data originating from different clinical and research sources depending on the specific EHR and other data sources utilized at a given deployment site.

Original languageEnglish
JournalCancer Informatics
Volume23
DOIs
StatePublished - Jan 1 2024

Bibliographical note

Publisher Copyright:
© The Author(s) 2024.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported, in part, by UH3-CA225021, U24-CA215109, UG3-CA225021, U24-CA180924-05, and 5UL1TR003017 grants from the National Institutes of Health and generous private support to Stony Brook from Bob Beals and Betsy Barton. Additional support was provided through funding from the U.S. Department of Veterans Affairs - Boston Healthcare System through contract, IPA-RU-092920. This work leveraged resources from XSEDE, which is supported by NSF ACI-1548562 grant, including the Bridges system (NSF ACI-1445606) at the Pittsburgh Supercomputing Center. Services, results and/or products in support of the research were generated by Rutgers Cancer Institute of New Jersey Biomedical Informatics Shared Resource NCI-CCSG 7P30CA072720-24. The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported, in part, by UH3-CA225021, U24-CA215109, UG3-CA225021, U24-CA180924-05, and 5UL1TR003017 grants from the National Institutes of Health and generous private support to Stony Brook from Bob Beals and Betsy Barton. Additional support was provided through funding from the U.S. Department of Veterans Affairs - Boston Healthcare System through contract, IPA-RU-092920. This work leveraged resources from XSEDE, which is supported by NSF ACI-1548562 grant, including the Bridges system (NSF ACI-1445606) at the Pittsburgh Supercomputing Center. Services, results and/or products in support of the research were generated by Rutgers Cancer Institute of New Jersey Biomedical Informatics Shared Resource NCI-CCSG 7P30CA072720-24.

FundersFunder number
Boston Veterans Healthcare SystemIPA-RU-092920
XSEDE
National Science Foundation Arctic Social Science ProgramACI-1445606, ACI-1548562
National Science Foundation Arctic Social Science Program
National Institutes of Health (NIH)
U.S. Department of Veterans Affairs
Rutgers Cancer Institute of New Jersey and Rutgers University7P30CA072720-24
Rutgers Cancer Institute of New Jersey and Rutgers University

    Keywords

    • Multi-modal clinical research data warehouse
    • adaptable extraction
    • content based retrieval
    • decision support
    • large-scale multi-site collaboration
    • machine learning
    • transform and load interface

    ASJC Scopus subject areas

    • Oncology
    • Cancer Research

    Fingerprint

    Dive into the research topics of 'An Intelligent Search & Retrieval System (IRIS) and Clinical and Research Repository for Decision Support Based on Machine Learning and Joint Kernel-based Supervised Hashing'. Together they form a unique fingerprint.

    Cite this