Abstract
Extracting features from high-dimensional data is a critically important task for pattern recognition and machine learning applications. High-dimensional data typically have much more variables than observations, and contain significant noise, missing components, or outliers. Features extracted from high-dimensional data need to be discriminative, sparse, and can capture essential characteristics of the data. In this paper, we present a way to constructing multivariate features and then classify the data into proper classes. The resulting small subset of features is nearly the best in the sense of Greenshtein's persistence; however, the estimated feature weights may be biased. We take a systematic approach for correcting the biases. We use conjugate gradient-based primal-dual interior-point techniques for large-scale problems. We apply our procedure to microarray gene analysis. The effectiveness of our method is confirmed by experimental results.
Original language | English |
---|---|
Article number | 4770093 |
Pages (from-to) | 636-646 |
Number of pages | 11 |
Journal | IEEE/ACM Transactions on Computational Biology and Bioinformatics |
Volume | 7 |
Issue number | 4 |
DOIs | |
State | Published - 2010 |
Keywords
- High-dimensional data
- bias
- cancer classification
- convex optimization
- feature selection
- microarray gene analysis.
- persistence
- primal-dual interior-point optimization
ASJC Scopus subject areas
- Biotechnology
- Genetics
- Applied Mathematics