Complete vertebrate mitogenomes reveal widespread repeats and gene duplications

Research output: Contribution to journalArticlepeer-review

81 Scopus citations

Abstract

Background: Modern sequencing technologies should make the assembly of the relatively small mitochondrial genomes an easy undertaking. However, few tools exist that address mitochondrial assembly directly. Results: As part of the Vertebrate Genomes Project (VGP) we develop mitoVGP, a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (> 10 kbp, PacBio or Nanopore) and short (100–300 bp, Illumina) reads. Our pipeline leads to successful complete mitogenome assemblies of 100 vertebrate species of the VGP. We observe that tissue type and library size selection have considerable impact on mitogenome sequencing and assembly. Comparing our assemblies to purportedly complete reference mitogenomes based on short-read sequencing, we identify errors, missing sequences, and incomplete genes in those references, particularly in repetitive regions. Our assemblies also identify novel gene region duplications. The presence of repeats and duplications in over half of the species herein assembled indicates that their occurrence is a principle of mitochondrial structure rather than an exception, shedding new light on mitochondrial genome evolution and organization. Conclusions: Our results indicate that even in the “simple” case of vertebrate mitogenomes the completeness of many currently available reference sequences can be further improved, and caution should be exercised before claiming the complete assembly of a mitogenome, particularly from short reads alone.

Original languageEnglish
Article number120
JournalGenome Biology
Volume22
Issue number1
DOIs
StatePublished - Dec 2021

Bibliographical note

Publisher Copyright:
© 2021, The Author(s).

Funding

A. R., S. K., and A. M. P. were supported by the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health. A. R. was also supported by the Korea Health Technology R&D Project through KHIDI, funded by the Ministry of Health & Welfare, Republic of Korea (HI17C2098). F. O. A. was supported by Al-Gannas Qatari Society and The Cultural Village Foundation-Katara, Doha, State of Qatar and Monash University Malaysia. G. F. and E. D. J were supported by Rockefeller University start-up funds and the Howard Hughes Medical Institute. A.A. and M.R.C. received support from the Fondazione Cariplo project no. 2018\u20132045 and the Italian Ministry of Education, University and Research (MIUR) for Progetti PRIN2017 20174BTC4R and Dipartimenti di Eccellenza Program (2018\u20132022). R. D. and S. M. received funding from Wellcome grant WT207492. We thank the contributors of the VGP on the first 125 species for letting us use data for generating mitochondrial genome assemblies; in particular, they are Alexander N. G. Kirschel, Andrew Digby, Andrew Veale, Anne Bronikowski, Bob Murphy, Bruce Robertson, Clare Baker, Camila Mazzoni, Christopher Balakrishnan, Chul Lee, Daniel Mead, Emma Teeling, Erez Lieberman Aiden, Erica Todd, Evan Eichler, Gavin J.P. Naylor, Guojie Zhang, Jeramiah Smith, Jochen Wolf, Justin Touchon, Kira Delmore, Kjetill Jakobsen, Lisa Komoroske, Mark Wilkinson, Martin Genner, Martin P\u0161eni\u010Dka, Matthew Fuxjager, Mike Stratton, Miriam Liedvogel, Neil Gemmell, Piotr Minias, Peter O. Dunn, Peter Sudmant, Phil Morin, Sadequr Rahman,\u00A0Qasim Ayub, Robert Kraus, Sonja Vernes, Steve Smith, Tanya Lama, Taylor Edwards, Tim Smith, Tom Gilbert, Tomas Marques-Bonet, Tony Einfeldt, Byrappa Venkatesh, Warren Johnson, Wes Warren, and Yury Bukhman. We are grateful to the Ng\u0101 Papatipu R\u016Bnanga o Murihiku and the Ng\u0101i Tahu for their support in generating the kakapo datasets. We also thank Aureliano Bombarely for his support in testing Organelle_PBA on VGP datasets. The review history is available as Additional\u00A0file\u00A012. Barbara Cheifet was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

FundersFunder number
State of Qatar and Monash University Malaysia
Ngā Papatipu Rūnanga o Murihiku
Al-Gannas Qatari Society
National Institutes of Health (NIH)
Rockefeller University
Cultural Village Foundation-Katara
Korea Health Technology R&D Project
Korea Health Industry Development Institute
Howard Hughes Medical Institute
Korean Ministry of Health and WelfareHI17C2098
Korean Ministry of Health and Welfare
Wellcome TrustWT207492, 207492
Wellcome Trust
Fondazione Cariplo2018–2045
Fondazione Cariplo
Ministero dell’Istruzione, dell’Università e della Ricerca20174BTC4R
Ministero dell’Istruzione, dell’Università e della Ricerca
National Human Genome Research InstituteZIAHG200398
National Human Genome Research Institute

    Keywords

    • Assembly
    • Duplications
    • Long reads
    • Mitochondrial DNA
    • Repeats
    • Sequencing
    • Vertebrate

    ASJC Scopus subject areas

    • Ecology, Evolution, Behavior and Systematics
    • Genetics
    • Cell Biology

    Fingerprint

    Dive into the research topics of 'Complete vertebrate mitogenomes reveal widespread repeats and gene duplications'. Together they form a unique fingerprint.

    Cite this