REC: Fast sparse regression-based multicategory classification

Chong Zhang, Xiaoling Lu, Zhengyuan Zhu, Yin Hu, Darshan Singh, Corbin Jones, Jinze Liu, Jan F. Prins, Yufeng Liu

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Recent advance in technology enables researchers to gather and store enormous data sets with ultra high dimensionality. In bioinformatics, microarray and next generation sequencing technologies can produce data with tens of thousands of predictors of biomarkers. On the other hand, the corresponding sample sizes are often limited. For classification problems, to predict new observations with high accuracy, and to better understand the effect of predictors on classification, it is desirable, and often necessary, to train the classifier with variable selection. In the literature, sparse regularized classification techniques have been popular due to the ability of simultaneous classification and variable selection. Despite its success, such a sparse penalized method may have low computational speed, when the dimension of the problem is ultra high. To overcome this challenge, we propose a new sparse REgression based multicategory Classifier (REC). Our method uses a simplex to represent different categories of the classification problem. A major advantage of REC is that the optimization can be decoupled into smaller independent sparse penalized regression problems, and hence solved by using parallel computing. Consequently, REC enjoys an extraordinarily fast computational speed. Moreover, REC is able to provide class conditional probability estimation. Simulated examples and applications on microarray and next generation sequencing data suggest that REC is very competitive when compared to several existing methods.

Original languageEnglish
Pages (from-to)175-185
Number of pages11
JournalStatistics and its Interface
Volume10
Issue number2
DOIs
StatePublished - 2017

Bibliographical note

Funding Information:
The authors would like to thank the Editor, Prof. Heping Zhang, for helpful suggestions. The authors were supported in part by US National Science Foundation and Engineering Research Council of Canada (NSERC), NSF grant DMS1407241, IIS1054631, NIH grants CA149569, HG06272, CA142538, P30CA177558, and National Natural Science Foundation of China (NSFC 61472475).

Funding

The authors would like to thank the Editor, Prof. Heping Zhang, for helpful suggestions. The authors were supported in part by US National Science Foundation and Engineering Research Council of Canada (NSERC), NSF grant DMS1407241, IIS1054631, NIH grants CA149569, HG06272, CA142538, P30CA177558, and National Natural Science Foundation of China (NSFC 61472475).

FundersFunder number
National Institutes of Health (NIH)P30CA177558, CA149569, CA142538, HG06272
National Institutes of Health (NIH)
Natural Sciences and Engineering Research Council of CanadaIIS1054631, DMS1407241
Natural Sciences and Engineering Research Council of Canada
National Natural Science Foundation of China (NSFC)NSFC 61472475
National Natural Science Foundation of China (NSFC)

    Keywords

    • LASSO
    • Parallel computing
    • Probability estimation
    • Simplex
    • Variable selection

    ASJC Scopus subject areas

    • Statistics and Probability
    • Applied Mathematics

    Fingerprint

    Dive into the research topics of 'REC: Fast sparse regression-based multicategory classification'. Together they form a unique fingerprint.

    Cite this