TY - JOUR
T1 - Interspecific gene flow obscures phylogenetic relationships in an important insect pest species complex
AU - San Jose, Michael
AU - Doorenweerd, Camiel
AU - Geib, Scott
AU - Barr, Norman
AU - Dupuis, Julian R.
AU - Leblanc, Luc
AU - Kauwe, Angela
AU - Morris, Kimberley Y.
AU - Rubinoff, Daniel
N1 - Publisher Copyright:
© 2023 Elsevier Inc.
PY - 2023/11
Y1 - 2023/11
N2 - As genomic data proliferates, the prevalence of post-speciation gene flow is making species boundaries and relationships increasingly ambiguous. Although current approaches inferring fully bifurcating phylogenies based on concatenated datasets provide simple and robust answers to many species relationships, they may be inaccurate because the models ignore inter-specific gene flow and incomplete lineage sorting. To examine the potential error resulting from ignoring gene flow, we generated both a RAD-seq and a 500 protein-coding loci highly multiplexed amplicon (HiMAP) dataset for a monophyletic group of 12 species defined as the Bactrocera dorsalis sensu lato clade. With some of the world's worst agricultural pests, the taxonomy of the B. dorsalis s.l. clade is important for trade and quarantines. However, taxonomic confusion confounds resolution due to intra- and interspecific phenotypic variation and convergence, mitochondrial introgression across half of the species, and viable hybrids. We compared the topological convergence of our datasets using concatenated phylogenetic and various multispecies coalescent approaches, some of which account for gene flow. All analyses agreed on species delimitation, but there was incongruence between species relationships. Under concatenation, both datasets suggest identical species relationships with mostly high statistical support. However, multispecies coalescent and multispecies network approaches suggest markedly different hypotheses and detected significant gene flow. We suggest that the network approaches are likely more accurate because gene flow violates the assumptions of the concatenated phylogenetic analyses, but the data-reductive requirements of network approaches resulted in reduced statistical support and could not unambiguously resolve gene flow directions. Our study highlights the importance of testing for gene flow, particularly with phylogenomic datasets, even when concatenated approaches receive high statistical support.
AB - As genomic data proliferates, the prevalence of post-speciation gene flow is making species boundaries and relationships increasingly ambiguous. Although current approaches inferring fully bifurcating phylogenies based on concatenated datasets provide simple and robust answers to many species relationships, they may be inaccurate because the models ignore inter-specific gene flow and incomplete lineage sorting. To examine the potential error resulting from ignoring gene flow, we generated both a RAD-seq and a 500 protein-coding loci highly multiplexed amplicon (HiMAP) dataset for a monophyletic group of 12 species defined as the Bactrocera dorsalis sensu lato clade. With some of the world's worst agricultural pests, the taxonomy of the B. dorsalis s.l. clade is important for trade and quarantines. However, taxonomic confusion confounds resolution due to intra- and interspecific phenotypic variation and convergence, mitochondrial introgression across half of the species, and viable hybrids. We compared the topological convergence of our datasets using concatenated phylogenetic and various multispecies coalescent approaches, some of which account for gene flow. All analyses agreed on species delimitation, but there was incongruence between species relationships. Under concatenation, both datasets suggest identical species relationships with mostly high statistical support. However, multispecies coalescent and multispecies network approaches suggest markedly different hypotheses and detected significant gene flow. We suggest that the network approaches are likely more accurate because gene flow violates the assumptions of the concatenated phylogenetic analyses, but the data-reductive requirements of network approaches resulted in reduced statistical support and could not unambiguously resolve gene flow directions. Our study highlights the importance of testing for gene flow, particularly with phylogenomic datasets, even when concatenated approaches receive high statistical support.
KW - Amplicons
KW - Gene Flow
KW - Multispecies Coalescent
KW - Multispecies Network
KW - Phylogenomics
KW - RAD-seq
UR - http://www.scopus.com/inward/record.url?scp=85167442862&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85167442862&partnerID=8YFLogxK
U2 - 10.1016/j.ympev.2023.107892
DO - 10.1016/j.ympev.2023.107892
M3 - Article
C2 - 37524217
AN - SCOPUS:85167442862
SN - 1055-7903
VL - 188
JO - Molecular Phylogenetics and Evolution
JF - Molecular Phylogenetics and Evolution
M1 - 107892
ER -