TY - GEN
T1 - A new feature selection method for improving the precision of diagnosing abnormal protein sequences by support vector machine and vectorization method
AU - Kim, Eun Mi
AU - Jeong, Jong Cheol
AU - Pae, Ho Young
AU - Lee, Bae Ho
PY - 2007
Y1 - 2007
N2 - Pattern recognition and classification problems are most popular issue in machine learning, and it seem that they meet their second golden age with bioinformatics. However, the dataset of bioinformatics has several distinctive characteristics compared to the data set in classical pattern recognition and classification research area. One of the most difficulties using this theory in bioinformatics is that raw data of DNA or protein sequences cannot be directly used as input data for machine learning because every sequence has different length of its own code sequences. Therefore, this paper introduces one of the methods to overcome this difficulty, and also argues that the capability of generalization in this method is very poor as showing simple experiments. Finally, this paper suggests different approach to select the fixed number of effective features by using Support Vector Machine, and noise whitening method. This paper also defines the criteria of this suggested method and shows that this method improves the precision of diagnosing abnormal protein sequences with experiment of classifying ovarian cancer data set.
AB - Pattern recognition and classification problems are most popular issue in machine learning, and it seem that they meet their second golden age with bioinformatics. However, the dataset of bioinformatics has several distinctive characteristics compared to the data set in classical pattern recognition and classification research area. One of the most difficulties using this theory in bioinformatics is that raw data of DNA or protein sequences cannot be directly used as input data for machine learning because every sequence has different length of its own code sequences. Therefore, this paper introduces one of the methods to overcome this difficulty, and also argues that the capability of generalization in this method is very poor as showing simple experiments. Finally, this paper suggests different approach to select the fixed number of effective features by using Support Vector Machine, and noise whitening method. This paper also defines the criteria of this suggested method and shows that this method improves the precision of diagnosing abnormal protein sequences with experiment of classifying ovarian cancer data set.
UR - http://www.scopus.com/inward/record.url?scp=38049068859&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=38049068859&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-71629-7_41
DO - 10.1007/978-3-540-71629-7_41
M3 - Conference contribution
AN - SCOPUS:38049068859
SN - 9783540715900
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 364
EP - 372
BT - Adaptive and Natural Computing Algorithms - 8th International Conference, ICANNGA 2007, Proceedings
T2 - 8th International Conference on Adaptive and Natural Computing Algorithms, ICANNGA 2007
Y2 - 11 April 2007 through 14 April 2007
ER -