Evolutionary hypotheses provide important underpinnings of biological and medical sciences, and comprehensive, genome-wide understanding of evolutionary relationships among organisms are needed to test and refine such hypotheses. Theory and empirical evidence clearly indicate that phylogenies (trees) of different genes (loci) should not display precisely matching topologies. The main reason for such phylogenetic incongruence is reticulated evolutionary history of most species due to meiotic sexual recombination in eukaryotes, or horizontal transfers of genetic material in prokaryotes. Nevertheless, many genes should display topologically related phylogenies, and should group into one or more (for genetic hybrids) clusters in poly-dimensional 'tree space'. Unusual evolutionary histories or effects of selection may result in 'outlier' genes with phylogenies that fall outside the main distribution(s) of trees in tree space. We present a new phylogenomic method, CURatio, which uses ratios of total branch lengths in gene trees to help identify phylogenetic outliers in a given set of ortholog groups from multiple genomes. An advantage of CURatio over other methods is that genes absent from and/or duplicated in some genomes can be included in the analysis. We conducted a simulation study under the coalescent model, and showed that, given sufficient species depth and topological difference, these ratios are significantly higher for the 'outlier' gene phylogenies. Also, we applied CURatio to a set of annotated genomes of the fungal family, Clavicipitaceae, and identified alkaloid biosynthesis genes as outliers, probably due to a history of duplication and loss. The source code is available at https://github.com/QiwenKang/CURatio, and the empirical data set for Clavicipitaceae and simulated data set are available at Mendeley https://data.mendeley.com/datasets/mrxts7wjrr/1.
|Number of pages||9|
|Journal||IEEE/ACM Transactions on Computational Biology and Bioinformatics|
|State||Published - May 1 2020|
Bibliographical noteFunding Information:
The authors thank Walter Hollin for laboratory support, Jenni-fer S. Webb for genome sequencing, Jolanta Jaromczyk for bioinformatic support, and Jerzy W. Jaromczyk for helpful algorithmic discussions. This work was supported by National Institute of Food and Agriculture grant 2012-67013-19384, National Institutes of Health grants R01GM086888 and 2 P20 RR-16481, and National Science Foundation grant EPS-0814194. Genome sequence analysis was conducted in the University of Kentucky’s Advanced Genetic Technologies Center.
© 2004-2012 IEEE.
- Evolutionary models
- gene trees
- likelihood functions
- species trees
ASJC Scopus subject areas
- Applied Mathematics