Exome-wide evaluation of rare coding variants using electronic health records identifies new gene–phenotype associations

Joseph Park, Anastasia M. Lucas, Xinyuan Zhang, Kumardeep Chaudhary, Judy H. Cho, Girish Nadkarni, Amanda Dobbyn, Geetha Chittoor, Navya S. Josyula, Nathan Katz, Joseph H. Breeyear, Shadi Ahmadmehrabi, Theodore G. Drivas, Venkata R.M. Chavali, Maria Fasolino, Hisashi Sawada, Alan Daugherty, Yanming Li, Chen Zhang, Yuki BradfordJo Ellen Weaver, Anurag Verma, Renae L. Judy, Rachel L. Kember, John D. Overton, Jeffrey G. Reid, Manuel A.R. Ferreira, Alexander H. Li, Aris Baras, Scott A. LeMaire, Ying H. Shen, Ali Naji, Klaus H. Kaestner, Golnaz Vahedi, Todd L. Edwards, Jinbo Chen, Scott M. Damrauer, Anne E. Justice, Ron Do, Marylyn D. Ritchie, Daniel J. Rader

Research output: Contribution to journalArticlepeer-review

23 Scopus citations


The clinical impact of rare loss-of-function variants has yet to be determined for most genes. Integration of DNA sequencing data with electronic health records (EHRs) could enhance our understanding of the contribution of rare genetic variation to human disease1. By leveraging 10,900 whole-exome sequences linked to EHR data in the Penn Medicine Biobank, we addressed the association of the cumulative effects of rare predicted loss-of-function variants for each individual gene on human disease on an exome-wide scale, as assessed using a set of diverse EHR phenotypes. After discovering 97 genes with exome-by-phenome-wide significant phenotype associations (P < 10−6), we replicated 26 of these in the Penn Medicine Biobank, as well as in three other medical biobanks and the population-based UK Biobank. Of these 26 genes, five had associations that have been previously reported and represented positive controls, whereas 21 had phenotype associations not previously reported, among which were genes implicated in glaucoma, aortic ectasia, diabetes mellitus, muscular dystrophy and hearing loss. These findings show the value of aggregating rare predicted loss-of-function variants into ‘gene burdens’ for identifying new gene–disease associations using EHR phenotypes in a medical biobank. We suggest that application of this approach to even larger numbers of individuals will provide the statistical power required to uncover unexplored relationships between rare genetic variation and disease phenotypes.

Original languageEnglish
Pages (from-to)66-72
Number of pages7
JournalNature Medicine
Issue number1
StatePublished - Jan 2021

Bibliographical note

Funding Information:
The PMBB is funded by the Perelman School of Medicine at the University of Pennsylvania, a gift from the Smilow family, and the National Center for Advancing Translational Sciences of the National Institutes of Health under CTSA Award Number UL1TR001878. We thank D. Birtwell, H. Williams, P. Baumann and M. Risman for informatics support regarding the PMBB. We thank the staff of the Regeneron Genetics Center for whole-exome sequencing of DNA from PMBB participants. We thank S. Rathi for help with the real-time PCR experiment on iPSC-RGCs and J. He for help with iPSC-RGC cultures. We thank S. Dudek for assistance with the PMBB Genome Browser. Research reported in this paper was supported by grants from the National Human Genome Research Institute of the National Institutes of Health under award number F30HG010442 (to J.P.); the National Eye Institute of the National Institutes of Health under award number R21EY028273-01A1, BrightFocus Foundation, Lisa Dean Moseley Foundation and Research to Prevent Blindness, F.M. Kirby Foundation and The Paul and Evanina Bell Mackall Foundation Trust (to V.R.M.C); American Heart Association SFRN in Vascular Disease under award numbers 18SFRN33960114 and 18SFRN33960163 (to S.A.L. and A.D.) and National Institutes of Health under award number 1R01HL143359 (to Y.H.S. and S.A.L.); Sarnoff Cardiovascular Research Foundation (to S.A.); Institute for Translational Medicine and Therapeutics Transdisciplinary Program in Translational Medicine and Therapeutics (to R.L.K.)

Publisher Copyright:
© 2021, The Author(s), under exclusive licence to Springer Nature America, Inc.

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology (all)


Dive into the research topics of 'Exome-wide evaluation of rare coding variants using electronic health records identifies new gene–phenotype associations'. Together they form a unique fingerprint.

Cite this