Abstract
Large-scale, multi-site collaboration is becoming indispensable for a wide range of research and clinical activities in oncology. To facilitate the next generation of advances in cancer biology, precision oncology and the population sciences it will be necessary to develop and implement data management and analytic tools that empower investigators to reliably and objectively detect, characterize and chronicle the phenotypic and genomic changes that occur during the transformation from the benign to cancerous state and throughout the course of disease progression. To facilitate these efforts it is incumbent upon the informatics community to establish the workflows and architectures that automate the aggregation and organization of a growing range and number of clinical data types and modalities ranging from new molecular and laboratory tests to sophisticated diagnostic imaging studies. In an attempt to meet those challenges, leading health care centers across the country are making steep investments to establish enterprise-wide, data warehouses. A significant limitation of many data warehouses, however, is that they are designed to support only alphanumeric information. In contrast to those traditional designs, the system that we have developed supports automated collection and mining of multimodal data including genomics, digital pathology and radiology images. In this paper, our team describes the design, development and implementation of a multi-modal, Clinical & Research Data Warehouse (CRDW) that is tightly integrated with a suite of computational and machine-learning tools to provide actionable insight into the underlying characteristics of the tumor environment that would not be revealed using standard methods and tools. The System features a flexible Extract, Transform and Load (ETL) interface that enables it to adapt to aggregate data originating from different clinical and research sources depending on the specific EHR and other data sources utilized at a given deployment site.
Original language | English |
---|---|
Journal | Cancer Informatics |
Volume | 23 |
DOIs | |
State | Published - Jan 1 2024 |
Bibliographical note
Publisher Copyright:© The Author(s) 2024.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported, in part, by UH3-CA225021, U24-CA215109, UG3-CA225021, U24-CA180924-05, and 5UL1TR003017 grants from the National Institutes of Health and generous private support to Stony Brook from Bob Beals and Betsy Barton. Additional support was provided through funding from the U.S. Department of Veterans Affairs - Boston Healthcare System through contract, IPA-RU-092920. This work leveraged resources from XSEDE, which is supported by NSF ACI-1548562 grant, including the Bridges system (NSF ACI-1445606) at the Pittsburgh Supercomputing Center. Services, results and/or products in support of the research were generated by Rutgers Cancer Institute of New Jersey Biomedical Informatics Shared Resource NCI-CCSG 7P30CA072720-24. The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported, in part, by UH3-CA225021, U24-CA215109, UG3-CA225021, U24-CA180924-05, and 5UL1TR003017 grants from the National Institutes of Health and generous private support to Stony Brook from Bob Beals and Betsy Barton. Additional support was provided through funding from the U.S. Department of Veterans Affairs - Boston Healthcare System through contract, IPA-RU-092920. This work leveraged resources from XSEDE, which is supported by NSF ACI-1548562 grant, including the Bridges system (NSF ACI-1445606) at the Pittsburgh Supercomputing Center. Services, results and/or products in support of the research were generated by Rutgers Cancer Institute of New Jersey Biomedical Informatics Shared Resource NCI-CCSG 7P30CA072720-24.
Funders | Funder number |
---|---|
Boston Veterans Healthcare System | IPA-RU-092920 |
XSEDE | |
National Science Foundation Arctic Social Science Program | ACI-1445606, ACI-1548562 |
National Science Foundation Arctic Social Science Program | |
National Institutes of Health (NIH) | |
U.S. Department of Veterans Affairs | |
Rutgers Cancer Institute of New Jersey and Rutgers University | 7P30CA072720-24 |
Rutgers Cancer Institute of New Jersey and Rutgers University |
Keywords
- Multi-modal clinical research data warehouse
- adaptable extraction
- content based retrieval
- decision support
- large-scale multi-site collaboration
- machine learning
- transform and load interface
ASJC Scopus subject areas
- Oncology
- Cancer Research