TY - GEN
T1 - Approximating the true evolutionary distance between two genomes
AU - Swenson, Krister M.
AU - Marron, Mark
AU - Earnest-DeYoung, Joel V.
AU - Moret, Bernard M.E.
PY - 2005
Y1 - 2005
N2 - As more and more genomes are sequenced, evolutionary biologists are becoming increasingly interested in evolution at the level of whole genomes, in scenarios in which the genome evolves through insertions, duplications, deletions, and movements of genes along its chromosomes. In the mathematical model pioneered by Sankoff and others, a unichromosomal genome is represented by a signed permutation of a multiset of genes; Hannenhalli and Pevzner showed that the edit distance between two signed permutations of the same set can be computed in polynomial time when all operations are inversions. El-Mabrouk extended that result to allow deletions and a limited form of insertions (which forbids duplications); in turn we extended it to compute a nearly optimal edit sequence between an arbitrary genome and the identity permutation. In this paper we extend and improve our previous work in two major ways. We generalize our approach to handle duplications as well as insertions and thus enable the computation of distances between two arbitrary genomes; and our new algorithm approximates true evolutionary distances, as opposed to the less useful edit distances. We present experimental results showing that our algorithm produces excellent estimates of the true evolutionary distance up to a (high) threshold of saturation; indeed, the distances thus produced are good enough to enable a simple neighbor-joining procedure to reconstruct our test trees with high accuracy.
AB - As more and more genomes are sequenced, evolutionary biologists are becoming increasingly interested in evolution at the level of whole genomes, in scenarios in which the genome evolves through insertions, duplications, deletions, and movements of genes along its chromosomes. In the mathematical model pioneered by Sankoff and others, a unichromosomal genome is represented by a signed permutation of a multiset of genes; Hannenhalli and Pevzner showed that the edit distance between two signed permutations of the same set can be computed in polynomial time when all operations are inversions. El-Mabrouk extended that result to allow deletions and a limited form of insertions (which forbids duplications); in turn we extended it to compute a nearly optimal edit sequence between an arbitrary genome and the identity permutation. In this paper we extend and improve our previous work in two major ways. We generalize our approach to handle duplications as well as insertions and thus enable the computation of distances between two arbitrary genomes; and our new algorithm approximates true evolutionary distances, as opposed to the less useful edit distances. We present experimental results showing that our algorithm produces excellent estimates of the true evolutionary distance up to a (high) threshold of saturation; indeed, the distances thus produced are good enough to enable a simple neighbor-joining procedure to reconstruct our test trees with high accuracy.
UR - http://www.scopus.com/inward/record.url?scp=32144446113&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=32144446113&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:32144446113
SN - 0898715962
T3 - Proceedings of the Seventh Workshop on Algorithm Engineering and Experiments and the Second Workshop on Analytic Algorithms and Combinatorics
SP - 121
EP - 129
BT - Proceedings of the Seventh Workshop on Algorithm Engineering and Experiments and the Second Workshop on Analytic Algorithms and Combinatorics
A2 - Demetrescu, C.
A2 - Sedgewick, R.
A2 - Tamassia, R.
T2 - Seventh Workshop on Algorithm Engineering and Experiments and the Second Workshop on Analytic Algorithms and Combinatorics
Y2 - 22 January 2005 through 22 January 2005
ER -