A system for phenotype harmonization in the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine (TOPMed) program

Adrienne M. Stilp, Leslie S. Emery, Jai G. Broome, Erin J. Buth, Alyna T. Khan, Cecelia A. Laurie, Fei Fei Wang, Quenna Wong, Dongquan Chen, Catherine M. D’Augustine, Nancy L. Heard-Costa, Chancellor R. Hohensee, William Craig Johnson, Lucia D. Juarez, Jingmin Liu, Karen M. Mutalik, Laura M. Raffield, Kerri L. Wiggins, Paul S. de Vries, Tanika N. KellyCharles Kooperberg, Pradeep Natarajan, Gina M. Peloso, Patricia A. Peyser, Alex P. Reiner, Donna K. Arnett, Stella Aslibekyan, Kathleen C. Barnes, Lawrence F. Bielak, Joshua C. Bis, Brian E. Cade, Ming Huei Chen, Adolfo Correa, L. Adrienne Cupples, Mariza de Andrade, Patrick T. Ellinor, Myriam Fornage, Nora Franceschini, Weiniu Gan, Santhi K. Ganesh, Jan Graffelman, Megan L. Grove, Xiuqing Guo, Nicola L. Hawley, Wan Ling Hsu, Rebecca D. Jackson, Cashell E. Jaquish, Andrew D. Johnson, Sharon L.R. Kardia, Shannon Kelly, Jiwon Lee, Rasika A. Mathias, Stephen T. McGarvey, Braxton D. Mitchell, May E. Montasser, Alanna C. Morrison, Kari E. North, Seyed Mehdi Nouraie, Elizabeth C. Oelsner, Nathan Pankratz, Stephen S. Rich, Jerome I. Rotter, Jennifer A. Smith, Kent D. Taylor, Ramachandran S. Vasan, Daniel E. Weeks, Scott T. Weiss, Carla G. Wilson, Lisa R. Yanek, Bruce M. Psaty, Susan R. Heckbert, Cathy C. Laurie

Research output: Contribution to journalArticlepeer-review

28 Scopus citations

Abstract

Genotype-phenotype association studies often combine phenotype data from multiple studies to increase statistical power. Harmonization of the data usually requires substantial effort due to heterogeneity in phenotype definitions, study design, data collection procedures, and data-set organization. Here we describe a centralized system for phenotype harmonization that includes input from phenotype domain and study experts, quality control, documentation, reproducible results, and data-sharing mechanisms. This system was developed for the National Heart, Lung, and Blood Institute’s Trans-Omics for Precision Medicine (TOPMed) program, which is generating genomic and other -omics data for more than 80 studies with extensive phenotype data. To date, 63 phenotypes have been harmonized across thousands of participants (recruited in 1948–2012) from up to 17 studies per phenotype. Here we discuss challenges in this undertaking and how they were addressed. The harmonized phenotype data and associated documentation have been submitted to National Institutes of Health data repositories for controlled access by the scientific community. We also provide materials to facilitate future harmonization efforts by the community, which include 1) the software code used to generate the 63 harmonized phenotypes, enabling others to reproduce, modify, or extend these harmonizations to additional studies, and 2) the results of labeling thousands of phenotype variables with controlled vocabulary terms.

Original languageEnglish
Pages (from-to)1977-1992
Number of pages16
JournalAmerican Journal of Epidemiology
Volume190
Issue number10
DOIs
StatePublished - 2021

Bibliographical note

Publisher Copyright:
© The Author(s) 2021.

Funding

HL095080, and HL073410). Jackson Heart Study: The Jackson Heart Study is supported by and conducted in collaboration with Jackson State University (contract HHSN268201800013I), Tougaloo College (contract HHSN268201800014I), the Mississippi State Department of Health (contract HHSN268201800015I), and the University of Mississippi Medical Center (contracts HHSN268201800010I, HHSN268201800011I, and HHSN268201800012I) through contracts from the NHLBI and the National Institute on Minority Health and Health Disparities. Mayo Clinic Venous Thromboembolism Study: The Mayo Clinic Venous Thromboembolism Study was funded, in part, by the NHLBI (grants HL66216 and HL83141), the National Human Genome Research Institute (grants HG04735 and HG06379), and the Mayo Foundation. Multi-Ethnic Study of Atherosclerosis (MESA): Whole-genome sequencing for MESA (dbGaP accession number phs001416.v1.p1) was performed at the Broad Institute of MIT and Harvard (Cambridge, Massachusetts) (award 3U54HG003067-13S1). Centralized read mapping and genotype calling, along with variant quality metrics and filtering, were provided by the TOPMed Informatics Research Center (award 3R01HL-117626-02S1). Phenotype harmonization, data management, sample-identity quality control, and general study coordination were provided by the TOPMed Data Coordinating Center (award 3R01HL-120393-02S1). MESA and the MESA SHARe project are conducted and supported by the NHLBI in collaboration with the MESA investigators. Support for MESA is provided by NIH contracts 75N92020D00001 (NHLBI), HHSN268201500003I (NHLBI), N01-HC-95159 (NHLBI), 75N92020D00005 (NHLBI), N01-HC-95160 (NHLBI), 75N92020D00002 (NHLBI), N01-HC-95161 (NHLBI), 75N92020D00003 (NHLBI), N01-HC-95162 (NHLBI), 75N92020D00006 (NHLBI), N01-HC-95163 (NHLBI), 75N92020D00004 (NHLBI), N01-HC-95164 (NHLBI), 75N92020D00007 (NHLBI), N01-HC-95165 (NHLBI), N01-HC-95166 (NHLBI), N01-HC-95167 (NHLBI), N01-HC-95168 (NHLBI), N01-HC-95169 (NHLBI), UL1-TR-000040 (National Center for Advancing Translational Sciences (NCATS) (Clinical and Translational Science Institute (CTSI))), UL1-TR-001079 (NCATS (CTSI)), UL1-TR-001420 (NCATS (CTSI)), UL1-TR-001881 (NCATS (CTSI)), and DK063491 (National Institute of Diabetes and Digestive and Kidney Diseases). Funding for SHARe genotyping was provided by NHLBI contract N02-HL-64278. Genotyping was performed at Affymetrix, Inc. (Santa Clara, California) and the Broad Institute of MIT and Harvard using the Affymetrix Genome-Wide Human SNP Array 6.0. Samoan Adiposity Study: Data collection for the Samoan Adiposity Study was funded by NIH grant R01-HL093093. Women\u2019s Health Initiative: The Women\u2019s Health Initiative program is funded by the NHLBI (contracts HHSN268201600018C, HHSN268201600001C, HHSN268201600002C, HHSN268201600003C, and HHSN268201600004C). The harmonized data presented in this paper have been submitted to the database of Genotypes and Phenotypes (dbGaP) and the NHLBI BioData Catalyst. The software code with which to reproduce the harmonized phenotypes presented in this paper from dbGaP files is available on GitHub (14). See the \u201CData Availability\u201D section of the text for details. We gratefully acknowledge the researchers and study participants who provided biological samples and data for TOPMed. We acknowledge Drs. Mike Feolo and Masato Kimura for making the harmonized phenotype data and the phenotype tagging data available in dbGaP. We also acknowledge contributors to the overall TOPMed project, who can be found on the TOPMed Data Coordinating Center website (https://www.nhlbiwgs.org/topmed-banner-authorship). The Genetics of Cardiometabolic Health in the Amish Study investigators gratefully thank the Amish community and research volunteers for their long-standing partnership, and they acknowledge the dedication of their Amish liaisons, fieldworkers, and the Amish Research Clinic staff, without whom these studies would not have been possible. The ARIC Study investigators thank the study staff and participants for their important contributions. The Framingham Heart Study investigators acknowledge the dedication of the study participants, without whom this research would not have been possible. The Jackson Heart Study investigators thank the study staff and participants. The Samoan Adiposity Study investigators thank the Samoan participants in the study and local village authorities. They acknowledge the Samoan Ministry of Health and the Samoa Bureau of Statistics for their support of this research. This work was funded by numerous grants and contracts from the National Institutes of Health (NIH), US Department of Health and Human Services. The Trans-Omics in Precision Medicine (TOPMed) program was supported by the National Heart, Lung, and Blood Institute (NHLBI), NIH, with core services provided by the TOPMed Informatics Research Center (award 3R01HL-117626-02S1; contract HHSN268201800002I) and the TOPMed Data Coordinating Center (awards R01HL-120393 and U01HL-120393; contract HHSN268201800001I). Whole-genome sequencing for TOPMed was supported by the NHLBI. Phenotype harmonization activities were funded in part by the NHLBI (contract HHSN26820180001I). Additional harmonization funding was provided by the NHLBI (grant 5 U01 HL 120393-04). Phenotype variable tagging was funded by the NHLBI (grant supplement 3 U01 HL 120393-04S2) and the NIH Office of the Director as part of the NIH Data Commons Pilot Phase Consortium. Additional financial support was provided to some authors: N.F. was additionally supported by NIH grants R01-MD012765, R01-DK117445, and R21-HL140385. P.T.E. was additionally supported by NIH grants R01HL092577, R01HL128914, and K24HL105780. A.P.R. was additionally supported by NIH grant R01HL130733. P.S.d.V. was additionally supported by American Heart Association grant 18CDA34110116. E.C.O. was additionally supported by the NHLBI Pooled Cohorts Study and NIH grants R21-HL129924 and K23-HL130627. S.K.G. was additionally supported by NIH grants R01HL122684 and R01HL139672. B.E.C. was additionally supported by NIH grant K01-HL135405. P.N. and G.M.P. were additionally supported by NIH grant R01HL142711. R.S.V. was supported in part by the Evans Medical Foundation and the Jay and Louis Coffman Endowment from the Department of Medicine, Boston University School of Medicine. Financial support for individual TOPMed studies was provided by the following\u2014Genetics of Cardiometabolic Health in the Amish: The TOPMed component of the Amish Research Program was supported by NIH grants R01 HL121007, U01 HL072515, and R01 AG18728. Atherosclerosis Risk in Communities (ARIC) Study: The ARIC Study has been funded in whole or in part by the NHLBI (contracts HHSN268201700001I, HHSN268201700002I, HHSN268201700003I, HHSN268201700004I, and HHSN268201700005I). Coronary Artery Risk Development in Young Adults (CARDIA) Study: The CARDIA Study is conducted and supported by the NHLBI in collaboration with the University of Alabama at Birmingham (awards HHSN268201800005I and HHSN268201800007I), Northwestern University (award HHSN268201800003I), the University of Minnesota (award HHSN268201800006I), and the Kaiser Foundation Research Institute (award HHSN268201800004I). CARDIA is also partially supported by the Intramural Research Program of the National Institute on Aging and an Intra-Agency Agreement (agreement AG0005) between the National Institute on Aging and the NHLBI. Cleveland Family Study: The Cleveland Family Study has been supported in part by the NIH (grants R01-HL046380, KL2-RR024990, R35-HL135818, and R01-HL113338). Cardiovascular Health Study (CHS): The CHS was supported by contracts HHSN268201200036C, HHSN268200800007C, HHSN268201800001C, N01HC55222, N01HC85079, N01HC85080, N01HC85081, N01HC85082, N01HC85083, and N01HC85086 and grants U01HL080295 and U01HL130114 from the NHLBI, with additional contributions from the National Institute of Neurological Disorders and Stroke. Additional support was provided by the National Institute on Aging (award R01AG023629). Genetic Epidemiology of COPD Study (COPDGene): The COPDGene project was supported by awards U01 HL089897 and U01 HL089856 from the NHLBI. The COPDGene project is also supported by the COPD Foundation through contributions made to an industry advisory board comprised of AstraZeneca AB (Cambridge, United Kingdom), Boehringer Ingelheim (Ingelheim am Rhein, Germany), GlaxoSmithKline plc (London, United Kingdom), Novartis International AG (Basel, Switzerland), Pfizer, Inc. (New York, New York), Siemens Healthcare GmbH (Erlangen, Germany), and Sunovion Pharmaceuticals Inc. (Marlborough, Massachusetts). Genetic Epidemiology of Asthma in Costa Rica (CRA) Study: The CRA Study was funded by the NHLBI (grants R37 HL066289-14 and P01 HL132825). Framingham Heart Study: The Framingham Heart Study was supported by contracts NO1-HC-25195, HHSN268201500001I, and 75N92019D00031 from the NHLBI and by grant supplement R01 HL092577-06S1 for this research. Genetic Epidemiology Network of Arteriopathy (GENOA): Support for GENOA was provided by the NHLBI (awards HL054457, HL054464, HL054481, HL119443, HL087660, and HL085571). Genetics of Lipid Lowering Drugs and Diet Network (GOLDN): GOLDN biospecimens, baseline phenotype data, and intervention phenotype data were collected with funding from the NHLBI (grant U01 HL072524). Whole-genome sequencing in GOLDN was funded by the NHLBI (grant R01 HL104135 and grant supplement R01 HL104135-04S1). Hispanic Community Health Study/Study of Latinos (HCHS/SOL): The HCHS/SOL is a collaborative study supported by contracts between the NHLBI and the University of North Carolina (contract HHSN268201300001I/N01-HC-65233), the University of Miami (contract HHSN268201300004I/N01-HC-65234), the Albert Einstein College of Medicine (contract HHSN268201300002I/N01-HC-65235), and the University of Illinois at Chicago (contract HHSN268201300003I/N01-HC-65236 Northwestern University), and San Diego State University (contract HHSN268201300005I/N01-HC-65237). The following institutions have contributed to the HCHS/SOL through a transfer of funds to the NHLBI: the National Institute on Minority Health and Health Disparities, the National Institute on Deafness and Other Communication Disorders, the National Institute of Dental and Craniofacial Research, the National Institute of Diabetes and Digestive and Kidney Diseases, the National Institute of Neurological Disorders and Stroke, and the NIH Office of Dietary Supplements. Heart and Vascular Health Study: The Heart and Vascular Health Study was supported by the NHLBI (grants HL068986, HL085251, HL095080, and HL073410). Jackson Heart Study: The Jackson Heart Study is supported by and conducted in collaboration with Jackson State University (contract HHSN268201800013I), Tougaloo College (contract HHSN268201800014I), the Mississippi State Department of Health (contract HHSN268201800015I), and the University of Mississippi Medical Center (contracts HHSN268201800010I, HHSN268201800011I, and HHSN268201800012I) through contracts from the NHLBI and the National Institute on Minority Health and Health Disparities. Mayo Clinic Venous Thromboembolism Study: The Mayo Clinic Venous Thromboembolism Study was funded, in part, by the NHLBI (grants HL66216 and HL83141), the National Human Genome Research Institute (grants HG04735 and HG06379), and the Mayo Foundation. Multi-Ethnic Study of Atherosclerosis (MESA): Whole-genome sequencing for MESA (dbGaP accession number phs001416.v1.p1) was performed at the Broad Institute of MIT and Harvard (Cambridge, Massachusetts) (award 3U54HG003067-13S1). Centralized read mapping and genotype calling, along with variant quality metrics and filtering, were provided by the TOPMed Informatics Research Center (award 3R01HL-117626-02S1). Phenotype harmonization, data management, sample-identity quality control, and general study coordination were provided by the TOPMed Data Coordinating Center (award 3R01HL-120393-02S1). MESA and the MESA SHARe project are conducted and supported by the NHLBI in collaboration with the MESA investigators. Support for MESA is provided by NIH contracts 75N92020D00001 (NHLBI), HHSN268201500003I (NHLBI), N01-HC-95159 (NHLBI), 75N92020D00005 (NHLBI), N01-HC-95160 (NHLBI), 75N92020D00002 (NHLBI), N01-HC-95161 (NHLBI), 75N92020D00003 (NHLBI), N01-HC-95162 (NHLBI), 75N92020D00006 (NHLBI), N01-HC-95163 (NHLBI), 75N92020D00004 (NHLBI), N01-HC-95164 (NHLBI), 75N92020D00007 (NHLBI), N01-HC-95165 (NHLBI), N01-HC-95166 (NHLBI), N01-HC-95167 (NHLBI), N01-HC-95168 (NHLBI), N01-HC-95169 (NHLBI), UL1-TR-000040 (National Center for Advancing Translational Sciences (NCATS) (Clinical and Translational Science Institute (CTSI))), UL1-TR-001079 (NCATS (CTSI)), UL1-TR-001420 (NCATS (CTSI)), UL1-TR-001881 (NCATS (CTSI)), and DK063491 (National Institute of Diabetes and Digestive and Kidney Diseases). Funding for SHARe genotyping was provided by NHLBI contract N02-HL-64278. Genotyping was performed at Affymetrix, Inc. (Santa Clara, California) and the Broad Institute of MIT and Harvard using the Affymetrix Genome-Wide Human SNP Array 6.0. Samoan Adiposity Study: Data collection for the Samoan Adiposity Study was funded by NIH grant R01-HL093093. Women\u2019s Health Initiative: The Women\u2019s Health Initiative program is funded by the NHLBI (contracts HHSN268201600018C, HHSN268201600001C, HHSN268201600002C, HHSN268201600003C, and HHSN268201600004C). This work was funded by numerous grants and contracts from the National Institutes of Health (NIH), US Department of Health and Human Services. The Trans-Omics in Precision Medicine (TOPMed) program

FundersFunder number
Mayo Foundation for Medical Education and Research
Office of Dietary Supplements
Broad Institute
Department of Medicine, Georgetown University
University of Illinois, Chicago
Boehringer-Ingelheim
Institute of Neurological Disorders and Stroke National Advisory Neurological Disorders and Stroke Council
Siemens Healthcare GmbH
Samoa Bureau of Statistics
Novartis
COPD Foundation
National Institute on Deafness and Other Communication Disorders
GlaxoSmithKline
U.S. Department of Health and Human Services
Clinical and Translational Science Institute, University of Pittsburgh
National Institute of Dental and Craniofacial Research
Pfizer
Samoan Ministry of Health
Evans Medical Foundation
National Heart, Lung, and Blood Institute (NHLBI)K01HL135405, P01HL132825, R21HL140385, R01HL113338, U01HL089897, R43HL095161, R43HL095160, K23HL130627, U10HL054464, R01HL146860, R01HL120393, R13HL095166, R01HL128914, R43HL095167, R01HL121007, R35HL135818, U01HL065233, R03HL141439, R01HL092577, U10HL054457, U01HL089856, R01HL085251, R01HL085571, R01HL104135, R01HL142711, R21HL129924, R01HL095080, K24HL105780, R37HL066289, ZIAHL006170, U01HL072515, R01HL095163, R01HL073410, U01HL080295, R01HL085083, R01HL065234, R01HL068986, R01HL083141, R44HL095169, R01HL087660, U01HL130114, U01HL072524, R01HL139672, R01HL093093, R01HL119443, R01HL117626, R01HL130733, R01HL066216, R01HL122684, R01HL064278, R21HL095165, R01HL046380, U01HL054481
National Institute on AgingN01HC85081, N01HC85082, AG0005, R01AG023629, N01HC85080, R01AG018728, N01HC85086, N01HC85083, N01HC85079, HHSN268200800007C, N01HC55222
American the American Heart AssociationR01HL142711, K23-HL130627, 18CDA34110116, R01HL139672, R01HL122684, R21-HL129924, K01-HL135405
National Institutes of Health/National Institute of Environmental Health SciencesP30ES000002
Jackson State UniversityHHSN268201800013I
National Institute of Diabetes and Digestive and Kidney DiseasesR01DK117445, P30DK063491, HHSN268201600018C
National Center for Advancing Translational Sciences (NCATS)UL1TR001079, UL1TR001420, UL1TR001881, UL1TR000040
National Human Genome Research InstituteU01HG006379, U01HG004735, U54HG003067
National Center for Research ResourcesKL2RR024990
Tougaloo CollegeHHSN268201800014I
National Institute on Minority Health and Health Disparities (NIMHD)R01MD012765
National Institutes of Health (NIH)N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC-95164, N01-HC-95160, N01-HC-95169, N01-HC-95159, 75N92020D00007, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, 75N92020D00001, 75N92020D00002, 75N92020D00005, 75N92020D00006, 75N92020D00003, 75N92020D00004
Mississippi State Department of HealthHHSN268201800015I
National Institute on Drug AbuseZ01DA000004
Sunovion Pharmaceuticals IncHL087660, R01 HL104135, HL085571, R37 HL066289-14, P01 HL132825, HL054457, U01 HL072524, NO1-HC-25195, HL054481, 75N92019D00031, HHSN268201500001I, HL054464, R01 HL104135-04S1, R01 HL092577-06S1
TOPMed Informatics Research CenterU01HL-120393, K24HL105780, R21-HL140385, R01-MD012765, R01HL-120393, 3R01HL-117626-02S1, HHSN26820180001I, 5 U01 HL 120393-04, R01HL130733, R01HL092577, R01-DK117445, HHSN268201800001I, HHSN268201800002I, R01HL128914, 3 U01 HL 120393-04S2
Indiana Clinical and Translational Sciences InstituteN02-HL-64278, UL1-TR-001881, UL1-TR-001420, UL1-TR-001079, HHSN268201600002C, HHSN268201600003C, DK063491, HHSN268201600001C, HHSN268201600018C, HHSN268201600004C, R01-HL093093
Miami Clinical and Translational Science Institute, University of MiamiHHSN268201300004I/N01-HC-65234
Northwestern Polytechnical UniversityHHSN268201800003I
Harvard Transdisciplinary Research in Energetics and Cancer Center, Harvard UniversityUL1-TR-000040, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC-95164, N01-HC-95160, N01-HC-95169, N01-HC-95159, 3U54HG003067-13S1, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, 3R01HL-120393-02S1, HHSN268201500003I
University of Alabama, BirminghamHHSN268201800005I, HHSN268201800007I
Kaiser Foundation Research InstituteHHSN268201800004I
Minnesota State University-MankatoHHSN268201800006I
University of North Carolina WilmingtonHHSN268201300001I/N01-HC-65233
Albert Einstein Cancer Center of the Albert Einstein College of Medicine of Yeshiva UniversityHHSN268201300002I/N01-HC-65235
University of Mississippi Medical CenterHHSN268201800012I, HHSN268201800011I, HHSN268201800010I
San Diego State UniversityHHSN268201300005I/N01-HC-65237
Department of Family Medicine School of Medicine Boston University School of Medicine and Boston Medical Center Boston University Institute for Health Systems Innovation and PolicyR01 HL121007, U01 HL072515, HHSN268201700003I, HHSN268201700004I, HHSN268201700005I, R01 AG18728, HHSN268201700001I, HHSN268201700002I

    Keywords

    • Cardiovascular disease
    • Common data elements
    • Hematologic disease
    • Information dissemination
    • Lung diseases
    • Phenotypes
    • Sleep-wake disorders

    ASJC Scopus subject areas

    • General Medicine

    Fingerprint

    Dive into the research topics of 'A system for phenotype harmonization in the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine (TOPMed) program'. Together they form a unique fingerprint.

    Cite this