TY - GEN

T1 - Approximating the true evolutionary distance between two genomes

AU - Swenson, Krister M.

AU - Marron, Mark

AU - Earnest-DeYoung, Joel V.

AU - Moret, Bernard M.E.

PY - 2005

Y1 - 2005

N2 - As more and more genomes are sequenced, evolutionary biologists are becoming increasingly interested in evolution at the level of whole genomes, in scenarios in which the genome evolves through insertions, duplications, deletions, and movements of genes along its chromosomes. In the mathematical model pioneered by Sankoff and others, a unichromosomal genome is represented by a signed permutation of a multiset of genes; Hannenhalli and Pevzner showed that the edit distance between two signed permutations of the same set can be computed in polynomial time when all operations are inversions. El-Mabrouk extended that result to allow deletions and a limited form of insertions (which forbids duplications); in turn we extended it to compute a nearly optimal edit sequence between an arbitrary genome and the identity permutation. In this paper we extend and improve our previous work in two major ways. We generalize our approach to handle duplications as well as insertions and thus enable the computation of distances between two arbitrary genomes; and our new algorithm approximates true evolutionary distances, as opposed to the less useful edit distances. We present experimental results showing that our algorithm produces excellent estimates of the true evolutionary distance up to a (high) threshold of saturation; indeed, the distances thus produced are good enough to enable a simple neighbor-joining procedure to reconstruct our test trees with high accuracy.

AB - As more and more genomes are sequenced, evolutionary biologists are becoming increasingly interested in evolution at the level of whole genomes, in scenarios in which the genome evolves through insertions, duplications, deletions, and movements of genes along its chromosomes. In the mathematical model pioneered by Sankoff and others, a unichromosomal genome is represented by a signed permutation of a multiset of genes; Hannenhalli and Pevzner showed that the edit distance between two signed permutations of the same set can be computed in polynomial time when all operations are inversions. El-Mabrouk extended that result to allow deletions and a limited form of insertions (which forbids duplications); in turn we extended it to compute a nearly optimal edit sequence between an arbitrary genome and the identity permutation. In this paper we extend and improve our previous work in two major ways. We generalize our approach to handle duplications as well as insertions and thus enable the computation of distances between two arbitrary genomes; and our new algorithm approximates true evolutionary distances, as opposed to the less useful edit distances. We present experimental results showing that our algorithm produces excellent estimates of the true evolutionary distance up to a (high) threshold of saturation; indeed, the distances thus produced are good enough to enable a simple neighbor-joining procedure to reconstruct our test trees with high accuracy.

UR - http://www.scopus.com/inward/record.url?scp=32144446113&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=32144446113&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:32144446113

SN - 0898715962

T3 - Proceedings of the Seventh Workshop on Algorithm Engineering and Experiments and the Second Workshop on Analytic Algorithms and Combinatorics

SP - 121

EP - 129

BT - Proceedings of the Seventh Workshop on Algorithm Engineering and Experiments and the Second Workshop on Analytic Algorithms and Combinatorics

A2 - Demetrescu, C.

A2 - Sedgewick, R.

A2 - Tamassia, R.

T2 - Seventh Workshop on Algorithm Engineering and Experiments and the Second Workshop on Analytic Algorithms and Combinatorics

Y2 - 22 January 2005 through 22 January 2005

ER -