Alzheimer's Disease Classification Using Genetic Data

Subash Khanal, Jin Chen, Nathan Jacobs, Ai Ling Lin

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

There has been a recent surge of interest in using genetic data to build ML-based accurate and interpretable disease classification models. In this line of research, we separately assess the potential of the peripheral blood gene expression data as well as the Single Nucleotide Polymorphism (SNP) data in building ML models for AD classification. We present a systematic approach on feature selection and ML model design using both types of genetic data provided by the Alzheimer's Disease Neuroimaging Initiatives (ADNI). Our two-step feature selection produced a curated list of important genes. In addition to these selected genetic features, to examine the role of non-genetic covariates, we included age and number of education years (EDU) as extra features. In the Control (CN) vs. AD classification, the best performing classifier, XGBoost, trained with gene expression features only and that with extra features included had Area Under Curve (AUC) of 0.64 and 0.65 respectively. However, AUC for the same task using SNP data only and that with extra features included was 0.56 and 0.64 respectively. The just above chance results of classifier trained with SNP features and the improvement when used along with additional covariates indicate low potential of SNP data in AD classification when used alone while also indicating the importance of non-genetic factors associated with AD. Nevertheless, with well above chance performance, gene expression features show great potential especially between groups of AD progression, i.e., CN vs. AD, CN vs. EMCI, EMCI vs. AD and LMCI vs. AD. The source code and manual are available at https://github.com/mvrl/ADNI_Genetics.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021
EditorsYufei Huang, Lukasz Kurgan, Feng Luo, Xiaohua Tony Hu, Yidong Chen, Edward Dougherty, Andrzej Kloczkowski, Yaohang Li
Pages2245-2252
Number of pages8
ISBN (Electronic)9781665401265
DOIs
StatePublished - 2021
Event2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021 - Virtual, Online, United States
Duration: Dec 9 2021Dec 12 2021

Publication series

NameProceedings - 2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021

Conference

Conference2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021
Country/TerritoryUnited States
CityVirtual, Online
Period12/9/2112/12/21

Bibliographical note

Publisher Copyright:
© 2021 IEEE.

Keywords

  • Alzheimer's disease
  • Feature Selection
  • Genetics
  • Machine Learning

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Biomedical Engineering
  • Health Informatics
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Alzheimer's Disease Classification Using Genetic Data'. Together they form a unique fingerprint.

Cite this