Resumen
Background: Over the last decade, metabolomics has evolved into a mainstream enterprise utilized by many laboratories globally. Like other " omics" data, metabolomics data has the characteristics of a smaller sample size compared to the number of features evaluated. Thus the selection of an optimal subset of features with a supervised classifier is imperative. We extended an existing feature selection algorithm, threshold gradient descent regularization (TGDR), to handle multi-class classification of " omics" data, and proposed two such extensions referred to as multi-TGDR. Both multi-TGDR frameworks were used to analyze a metabolomics dataset that compares the metabolic profiles of hepatocellular carcinoma (HCC) infected with hepatitis B (HBV) or C virus (HCV) with that of cirrhosis induced by HBV/HCV infection; the goal was to improve early-stage diagnosis of HCC.Results: We applied two multi-TGDR frameworks to the HCC metabolomics data that determined TGDR thresholds either globally across classes, or locally for each class. Multi-TGDR global model selected 45 metabolites with a 0% misclassification rate (the error rate on the training data) and had a 3.82% 5-fold cross-validation (CV-5) predictive error rate. Multi-TGDR local selected 48 metabolites with a 0% misclassification rate and a 5.34% CV-5 error rate.Conclusions: One important advantage of multi-TGDR local is that it allows inference for determining which feature is related specifically to the class/classes. Thus, we recommend multi-TGDR local be used because it has similar predictive performance and requires the same computing time as multi-TGDR global, but may provide class-specific inference.
| Idioma original | English |
|---|---|
| Número de artículo | 97 |
| Publicación | BMC Bioinformatics |
| Volumen | 15 |
| N.º | 1 |
| DOI | |
| Estado | Published - abr 4 2014 |
Nota bibliográfica
Funding Information:The study was supported by Natural Science Foundation of China (No 81172727 and 81202377). ST was also partially supported by a seed fund from the Jilin University (No 450060491885). We are grateful to two reviewers for their helpful comments and to Catherine Anthony for scientific editing. Especially, we thank Drs. Margaret MacDonald and Ype De Jong of the Rockefeller University for helpful discussion.
Financiación
The study was supported by Natural Science Foundation of China (No 81172727 and 81202377). ST was also partially supported by a seed fund from the Jilin University (No 450060491885). We are grateful to two reviewers for their helpful comments and to Catherine Anthony for scientific editing. Especially, we thank Drs. Margaret MacDonald and Ype De Jong of the Rockefeller University for helpful discussion.
| Financiadores | Número del financiador |
|---|---|
| National Childhood Cancer Registry – National Cancer Institute | P30CA177558 |
| National Natural Science Foundation of China (NSFC) | 81172727, 81202377 |
| Jilin University | 450060491885 |
ODS de las Naciones Unidas
Este resultado contribuye a los siguientes Objetivos de Desarrollo Sostenible
-
Good health and well being
ASJC Scopus subject areas
- Structural Biology
- Biochemistry
- Molecular Biology
- Computer Science Applications
- Applied Mathematics
Huella
Profundice en los temas de investigación de 'Multi-TGDR, a multi-class regularization method, identifies the metabolic profiles of hepatocellular carcinoma and cirrhosis infected with hepatitis B or hepatitis C virus'. En conjunto forman una huella única.Citar esto
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver