TY - JOUR
T1 - Statistical phylogenetic tree analysis using differences of means
AU - Arnaoudova, Elissaveta
AU - Haws, David C.
AU - Huggins, Peter
AU - Jaromczyk, Jerzy W.
AU - Moore, Neil
AU - Schardl, Christopher L.
AU - Yoshida, Ruriko
PY - 2010
Y1 - 2010
N2 - We propose a statistical method to test whether two phylogenetic trees with given alignments are significantly incongruent. Our method compares the two distributions of phylogenetic trees given by two input alignments, instead of comparing point estimations of trees. This statistical approach can be applied to gene tree analysis for example, detecting unusual events in genome evolution such as horizontal gene transfer and reshuffling. Our method uses difference of means to compare two distributions of trees, after mapping trees into a vector space. Bootstrapping alignment columns can then be applied to obtain p-values. To compute distances between means, we employ a "kernel method" which speeds up distance calculations when trees are mapped in a high-dimensional feature space, e.g., splits or quartets feature space. In this pilot study, first we test our statistical method on data sets simulated under a coalescence model, to test whether two alignments are generated by congruent gene trees. We follow our simulation results with applications to data sets of gophers and lice, grasses and their endophytes, and different fungal genes from the same genome. A companion toolkit, Phylotree, is provided to facilitate computational experiments.
AB - We propose a statistical method to test whether two phylogenetic trees with given alignments are significantly incongruent. Our method compares the two distributions of phylogenetic trees given by two input alignments, instead of comparing point estimations of trees. This statistical approach can be applied to gene tree analysis for example, detecting unusual events in genome evolution such as horizontal gene transfer and reshuffling. Our method uses difference of means to compare two distributions of trees, after mapping trees into a vector space. Bootstrapping alignment columns can then be applied to obtain p-values. To compute distances between means, we employ a "kernel method" which speeds up distance calculations when trees are mapped in a high-dimensional feature space, e.g., splits or quartets feature space. In this pilot study, first we test our statistical method on data sets simulated under a coalescence model, to test whether two alignments are generated by congruent gene trees. We follow our simulation results with applications to data sets of gophers and lice, grasses and their endophytes, and different fungal genes from the same genome. A companion toolkit, Phylotree, is provided to facilitate computational experiments.
KW - Difference of means
KW - Phylogenetic trees
KW - Tree congruency
UR - http://www.scopus.com/inward/record.url?scp=84859627244&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84859627244&partnerID=8YFLogxK
U2 - 10.3389/fnins.2010.00047
DO - 10.3389/fnins.2010.00047
M3 - Article
AN - SCOPUS:84859627244
SN - 1662-4548
VL - 4
JO - Frontiers in Neuroscience
JF - Frontiers in Neuroscience
IS - AUG
M1 - 47
ER -