Abstract
There has been a recent surge of interest in using genetic data to build ML-based accurate and interpretable disease classification models. In this line of research, we separately assess the potential of the peripheral blood gene expression data as well as the Single Nucleotide Polymorphism (SNP) data in building ML models for AD classification. We present a systematic approach on feature selection and ML model design using both types of genetic data provided by the Alzheimer's Disease Neuroimaging Initiatives (ADNI). Our two-step feature selection produced a curated list of important genes. In addition to these selected genetic features, to examine the role of non-genetic covariates, we included age and number of education years (EDU) as extra features. In the Control (CN) vs. AD classification, the best performing classifier, XGBoost, trained with gene expression features only and that with extra features included had Area Under Curve (AUC) of 0.64 and 0.65 respectively. However, AUC for the same task using SNP data only and that with extra features included was 0.56 and 0.64 respectively. The just above chance results of classifier trained with SNP features and the improvement when used along with additional covariates indicate low potential of SNP data in AD classification when used alone while also indicating the importance of non-genetic factors associated with AD. Nevertheless, with well above chance performance, gene expression features show great potential especially between groups of AD progression, i.e., CN vs. AD, CN vs. EMCI, EMCI vs. AD and LMCI vs. AD. The source code and manual are available at https://github.com/mvrl/ADNI_Genetics.
Original language | English |
---|---|
Title of host publication | Proceedings - 2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021 |
Editors | Yufei Huang, Lukasz Kurgan, Feng Luo, Xiaohua Tony Hu, Yidong Chen, Edward Dougherty, Andrzej Kloczkowski, Yaohang Li |
Pages | 2245-2252 |
Number of pages | 8 |
ISBN (Electronic) | 9781665401265 |
DOIs | |
State | Published - 2021 |
Event | 2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021 - Virtual, Online, United States Duration: Dec 9 2021 → Dec 12 2021 |
Publication series
Name | Proceedings - 2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021 |
---|
Conference
Conference | 2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021 |
---|---|
Country/Territory | United States |
City | Virtual, Online |
Period | 12/9/21 → 12/12/21 |
Bibliographical note
Publisher Copyright:© 2021 IEEE.
Keywords
- Alzheimer's disease
- Feature Selection
- Genetics
- Machine Learning
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Science Applications
- Biomedical Engineering
- Health Informatics
- Information Systems and Management