Abstract
Traditional sequence alignment methods are effective in identifying homologous proteins that are highly similar. However, these approaches do not perform well for remote homologous proteins, that is, proteins whose 3D structures are similar but their sequences are not. Recent biological research reveals that protein sequences contain residues that determine the 3D structure of proteins. In this work, we investigate incorporating this information to aid in the clustering of protein databases. We capture protein residues in the form of patterns with fixed order among them. First, the significant patterns are extracted from the protein sequences. Based on the extracted patterns, we perform sequence mining to generate the order among them. Finally, we adopt a partition-based method to cluster protein sequences using the patterns and order features. Experiments on COG and SCOP40 datasets show that our new approach is able to generate high quality clusters that are similar to those determined manually by the biologists.
Original language | English |
---|---|
Pages (from-to) | 26-30 |
Number of pages | 5 |
Journal | Proceedings of the International Conference on Tools with Artificial Intelligence |
State | Published - 2003 |
Event | Proceedings: 15th IEEE International Conference on Tools with artificial Intelligence - Sacramento, CA, United States Duration: Nov 3 2003 → Nov 5 2003 |
ASJC Scopus subject areas
- Software