TY - GEN
T1 - The genetic algorithm scheme for consensus sequences
AU - Gilkerson, Joshua W.
AU - Jaromczyk, Jerzy W.
PY - 2007
Y1 - 2007
N2 - A consensus sequence is a single sequence that represents characteristics of a family of sequences. Such synopses are most commonly used in the bioinformatics for sequence analysis. For example, algorithms that determine high quality consensus sequences are useful to construct a multiple alignment and consequently, a sequence logo (another representation that attempts to capture the important features of sequences). The determination of optimal consensus sequences is NP-hard (Gusfield). We present two new algorithms and compare them to earlier, published methods of determining consensus sequences. The first, CONSENSIZE, is an application of the Genetic Algorithm Scheme (GAS). The other is a simple steepest descent search, usually not very useful for NP-Hard problems, but surprisingly successful for this application. We discuss both algorithms and experimentally compare their accuracy and efficiency with the Simulated Annealing, Multiple Alignment and Center String approaches. Test results are presented on both synthetic data and biological sequences.
AB - A consensus sequence is a single sequence that represents characteristics of a family of sequences. Such synopses are most commonly used in the bioinformatics for sequence analysis. For example, algorithms that determine high quality consensus sequences are useful to construct a multiple alignment and consequently, a sequence logo (another representation that attempts to capture the important features of sequences). The determination of optimal consensus sequences is NP-hard (Gusfield). We present two new algorithms and compare them to earlier, published methods of determining consensus sequences. The first, CONSENSIZE, is an application of the Genetic Algorithm Scheme (GAS). The other is a simple steepest descent search, usually not very useful for NP-Hard problems, but surprisingly successful for this application. We discuss both algorithms and experimentally compare their accuracy and efficiency with the Simulated Annealing, Multiple Alignment and Center String approaches. Test results are presented on both synthetic data and biological sequences.
UR - http://www.scopus.com/inward/record.url?scp=79955287435&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79955287435&partnerID=8YFLogxK
U2 - 10.1109/CEC.2007.4424975
DO - 10.1109/CEC.2007.4424975
M3 - Conference contribution
AN - SCOPUS:79955287435
SN - 1424413400
SN - 9781424413409
T3 - 2007 IEEE Congress on Evolutionary Computation, CEC 2007
SP - 3870
EP - 3878
BT - 2007 IEEE Congress on Evolutionary Computation, CEC 2007
T2 - 2007 IEEE Congress on Evolutionary Computation, CEC 2007
Y2 - 25 September 2007 through 28 September 2007
ER -