## Abstract

Tree reconstruction methods are often judged by their accuracy, measured by how close they get to the true tree. Yet, most reconstruction methods like maximum likelihood (ML) do not explicitly maximize this accuracy. To address this problem, we propose a Bayesian solution. Given tree samples, we propose finding the tree estimate that is closest on average to the samples. This "median" tree is known as the Bayes estimator (BE). The BE literally maximizes posterior expected accuracy, measured in terms of closeness (distance) to the true tree. We discuss a unified framework of BE trees, focusing especially on tree distances that are expressible as squared euclidean distances. Notable examples include Robinson-Foulds (RF) distance, quartet distance, and squared path difference. Using both simulated and real data, we show that BEs can be estimated in practice by hill-climbing. In our simulation, we find that BEs tend to be closer to the true tree, compared with ML and neighbor joining. In particular, the BE under squared path difference tends to perform well in terms of both path difference and RF distances.

Original language | English |
---|---|

Pages (from-to) | 528-540 |

Number of pages | 13 |

Journal | Systematic Biology |

Volume | 60 |

Issue number | 4 |

DOIs | |

State | Published - Jul 2011 |

### Bibliographical note

Funding Information:FUNDING This work was supported by the Lane Fellowship in Computational Biology (to P.M.H.) and by the National Institutes of Health Research Project Grant Program (R01) from the Joint DMS/BIO/NIGMS Math/Bio Program (1R01GM086888-01 and 5R01GM086888-02 to P.M.H., W.L., D.H., T.F., and R.Y.).

## Keywords

- Bayes estimator
- consensus tree
- path difference metric
- phylogenetic inference

## ASJC Scopus subject areas

- Ecology, Evolution, Behavior and Systematics
- Genetics