TY - JOUR
T1 - The National COVID Cohort Collaborative (N3C)
T2 - Rationale, design, infrastructure, and deployment
AU - Haendel, Melissa A.
AU - Chute, Christopher G.
AU - Bennett, Tellen D.
AU - Eichmann, David A.
AU - Guinney, Justin
AU - Kibbe, Warren A.
AU - Payne, Philip R.O.
AU - Pfaff, Emily R.
AU - Robinson, Peter N.
AU - Saltz, Joel H.
AU - Spratt, Heidi
AU - Suver, Christine
AU - Wilbanks, John
AU - Wilcox, Adam B.
AU - Williams, Andrew E.
AU - Wu, Chunlei
AU - Blacketer, Clair
AU - Bradford, Robert L.
AU - Cimino, James J.
AU - Clark, Marshall
AU - Colmenares, Evan W.
AU - Francis, Patricia A.
AU - Gabriel, Davera
AU - Graves, Alexis
AU - Hemadri, Raju
AU - Hong, Stephanie S.
AU - Hripscak, George
AU - Jiao, Dazhi
AU - Klann, Jeffrey G.
AU - Kostka, Kristin
AU - Lee, Adam M.
AU - Lehmann, Harold P.
AU - Lingrey, Lora
AU - Miller, Robert T.
AU - Morris, Michele
AU - Murphy, Shawn N.
AU - Natarajan, Karthik
AU - Palchuk, Matvey B.
AU - Sheikh, Usman
AU - Solbrig, Harold
AU - Visweswaran, Shyam
AU - Walden, Anita
AU - Walters, Kellie M.
AU - Weber, Griffin M.
AU - Zhang, Xiaohan Tanner
AU - Zhu, Richard L.
AU - Amor, Benjamin
AU - Girvin, Andrew T.
AU - Manna, Amin
AU - Kavuluru, Ramakanth
N1 - Publisher Copyright:
© 2020 The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association.
PY - 2021/3/1
Y1 - 2021/3/1
N2 - Objective: Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers. Materials and Methods: The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics. Results: Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access. Conclusions: The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-Term impacts of COVID-19.
AB - Objective: Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers. Materials and Methods: The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics. Results: Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access. Conclusions: The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-Term impacts of COVID-19.
KW - COVID-19
KW - EHR data
KW - SARS-CoV-2
KW - clinical data model harmonization
KW - collaborative analytics
KW - open science
UR - http://www.scopus.com/inward/record.url?scp=85091994547&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85091994547&partnerID=8YFLogxK
U2 - 10.1093/jamia/ocaa196
DO - 10.1093/jamia/ocaa196
M3 - Article
C2 - 32805036
AN - SCOPUS:85091994547
SN - 1067-5027
VL - 28
SP - 427
EP - 443
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 3
ER -