TY - JOUR
T1 - Curated variation benchmarks for challenging medically relevant autosomal genes
AU - Wagner, Justin
AU - Olson, Nathan D.
AU - Harris, Lindsay
AU - McDaniel, Jennifer
AU - Cheng, Haoyu
AU - Fungtammasan, Arkarachai
AU - Hwang, Yih Chii
AU - Gupta, Richa
AU - Wenger, Aaron M.
AU - Rowell, William J.
AU - Khan, Ziad M.
AU - Farek, Jesse
AU - Zhu, Yiming
AU - Pisupati, Aishwarya
AU - Mahmoud, Medhat
AU - Xiao, Chunlin
AU - Yoo, Byunggil
AU - Sahraeian, Sayed Mohammad Ebrahim
AU - Miller, Danny E.
AU - Jáspez, David
AU - Lorenzo-Salazar, José M.
AU - Muñoz-Barrera, Adrián
AU - Rubio-Rodríguez, Luis A.
AU - Flores, Carlos
AU - Narzisi, Giuseppe
AU - Evani, Uday Shanker
AU - Clarke, Wayne E.
AU - Lee, Joyce
AU - Mason, Christopher E.
AU - Lincoln, Stephen E.
AU - Miga, Karen H.
AU - Ebbert, Mark T.W.
AU - Shumate, Alaina
AU - Li, Heng
AU - Chin, Chen Shan
AU - Zook, Justin M.
AU - Sedlazeck, Fritz J.
N1 - Publisher Copyright:
© 2022, This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply.
PY - 2022/5
Y1 - 2022/5
N2 - The repetitive nature and complexity of some medically relevant genes poses a challenge for their accurate analysis in a clinical setting. The Genome in a Bottle Consortium has provided variant benchmark sets, but these exclude nearly 400 medically relevant genes due to their repetitiveness or polymorphic complexity. Here, we characterize 273 of these 395 challenging autosomal genes using a haplotype-resolved whole-genome assembly. This curated benchmark reports over 17,000 single-nucleotide variations, 3,600 insertions and deletions and 200 structural variations each for human genome reference GRCh37 and GRCh38 across HG002. We show that false duplications in either GRCh37 or GRCh38 result in reference-specific, missed variants for short- and long-read technologies in medically relevant genes, including CBS, CRYAA and KCNE1. When masking these false duplications, variant recall can improve from 8% to 100%. Forming benchmarks from a haplotype-resolved whole-genome assembly may become a prototype for future benchmarks covering the whole genome.
AB - The repetitive nature and complexity of some medically relevant genes poses a challenge for their accurate analysis in a clinical setting. The Genome in a Bottle Consortium has provided variant benchmark sets, but these exclude nearly 400 medically relevant genes due to their repetitiveness or polymorphic complexity. Here, we characterize 273 of these 395 challenging autosomal genes using a haplotype-resolved whole-genome assembly. This curated benchmark reports over 17,000 single-nucleotide variations, 3,600 insertions and deletions and 200 structural variations each for human genome reference GRCh37 and GRCh38 across HG002. We show that false duplications in either GRCh37 or GRCh38 result in reference-specific, missed variants for short- and long-read technologies in medically relevant genes, including CBS, CRYAA and KCNE1. When masking these false duplications, variant recall can improve from 8% to 100%. Forming benchmarks from a haplotype-resolved whole-genome assembly may become a prototype for future benchmarks covering the whole genome.
UR - http://www.scopus.com/inward/record.url?scp=85124358447&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85124358447&partnerID=8YFLogxK
U2 - 10.1038/s41587-021-01158-1
DO - 10.1038/s41587-021-01158-1
M3 - Article
C2 - 35132260
AN - SCOPUS:85124358447
SN - 1087-0156
VL - 40
SP - 672
EP - 680
JO - Nature Biotechnology
JF - Nature Biotechnology
IS - 5
ER -