Curated variation benchmarks for challenging medically relevant autosomal genes

Justin Wagner, Nathan D. Olson, Lindsay Harris, Jennifer McDaniel, Haoyu Cheng, Arkarachai Fungtammasan, Yih Chii Hwang, Richa Gupta, Aaron M. Wenger, William J. Rowell, Ziad M. Khan, Jesse Farek, Yiming Zhu, Aishwarya Pisupati, Medhat Mahmoud, Chunlin Xiao, Byunggil Yoo, Sayed Mohammad Ebrahim Sahraeian, Danny E. Miller, David JáspezJosé M. Lorenzo-Salazar, Adrián Muñoz-Barrera, Luis A. Rubio-Rodríguez, Carlos Flores, Giuseppe Narzisi, Uday Shanker Evani, Wayne E. Clarke, Joyce Lee, Christopher E. Mason, Stephen E. Lincoln, Karen H. Miga, Mark T.W. Ebbert, Alaina Shumate, Heng Li, Chen Shan Chin, Justin M. Zook, Fritz J. Sedlazeck

Research output: Contribution to journalArticlepeer-review

60 Scopus citations


The repetitive nature and complexity of some medically relevant genes poses a challenge for their accurate analysis in a clinical setting. The Genome in a Bottle Consortium has provided variant benchmark sets, but these exclude nearly 400 medically relevant genes due to their repetitiveness or polymorphic complexity. Here, we characterize 273 of these 395 challenging autosomal genes using a haplotype-resolved whole-genome assembly. This curated benchmark reports over 17,000 single-nucleotide variations, 3,600 insertions and deletions and 200 structural variations each for human genome reference GRCh37 and GRCh38 across HG002. We show that false duplications in either GRCh37 or GRCh38 result in reference-specific, missed variants for short- and long-read technologies in medically relevant genes, including CBS, CRYAA and KCNE1. When masking these false duplications, variant recall can improve from 8% to 100%. Forming benchmarks from a haplotype-resolved whole-genome assembly may become a prototype for future benchmarks covering the whole genome.

Original languageEnglish
Pages (from-to)672-680
Number of pages9
JournalNature Biotechnology
Issue number5
StatePublished - May 2022

Bibliographical note

Publisher Copyright:
© 2022, This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply.

ASJC Scopus subject areas

  • Applied Microbiology and Biotechnology
  • Bioengineering
  • Molecular Medicine
  • Biotechnology
  • Biomedical Engineering


Dive into the research topics of 'Curated variation benchmarks for challenging medically relevant autosomal genes'. Together they form a unique fingerprint.

Cite this