Pairwise correlation analysis of the alzheimer’s disease neuroimaging initiative (Adni) dataset reveals significant feature correlation

,

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

The Alzheimer’s Disease Neuroimaging Initiative (ADNI) contains extensive patient measurements (e.g., magnetic resonance imaging [MRI], biometrics, RNA expression, etc.) from Alzheimer’s disease (AD) cases and controls that have recently been used by machine learning algorithms to evaluate AD onset and progression. While using a variety of biomarkers is essential to AD research, highly correlated input features can significantly decrease machine learning model generalizability and performance. Additionally, redundant features unnecessarily increase computational time and resources necessary to train predictive models. Therefore, we used 49,288 biomarkers and 793,600 extracted MRI features to assess feature correlation within the ADNI dataset to determine the extent to which this issue might impact large scale analyses using these data. We found that 93.457% of biomarkers, 92.549% of the gene expression values, and 100% of MRI features were strongly correlated with at least one other feature in ADNI based on our Bonferroni corrected α (p-value ≤ 1.40754 × 10−13). We provide a comprehensive mapping of all ADNI biomarkers to highly correlated features within the dataset. Additionally, we show that significant correlation within the ADNI dataset should be resolved before performing bulk data analyses, and we provide recommendations to address these issues. We anticipate that these recommendations and resources will help guide researchers utilizing the ADNI dataset to increase model performance and reduce the cost and complexity of their analyses.

Original languageEnglish
Article number1661
JournalGenes
Volume12
Issue number11
DOIs
StatePublished - Nov 2021

Bibliographical note

Publisher Copyright:
© 2021 by the authors. Licensee MDPI, Basel, Switzerland.

Funding

Acknowledgments: We thank the donors to the BrightFocus Foundation for their contributions to this research. We also acknowledge the Sanders-Brown Center on Aging at the University of Kentucky, Brigham Young University, and the Office of Research Computing at Brigham Young University for their institutional support and resources. Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd. and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD). For up-to-date information, see www.adni-info.org. This work was supported by the BrightFocus Foundation [A2020118F to Miller] and the National Institutes of Health [1P30AG072946-01 to the University of Kentucky Alzheimer?s Disease Research Center]. Data collection and sharing for this project was funded by the National Institute on Aging [R01AG046171, RF1AG051550 and 3U01AG024904-09S4 to the Alzheimer?s Disease Metabolomics Consortium].

FundersFunder number
Euroimmun
Alzheimer?s Disease Metabolomics Consortium
National Institute of Biomedical Imaging and Bioengineering
Eli Lilly and Company
F. Hoffmann-La Roche AG
DoD Alzheimer's Disease Neuroimaging Initiative
DOD ADNI
U.S. Department of DefenseW81XWH-12-2-0012
National Institute on AgingU01AG024904, RF1AG054052, RF1AG051550, R01AG046171
National Institutes of Health (NIH)1P30AG072946-01
BrightFocus FoundationA2020118F

    Keywords

    • ADNI
    • Alzheimer’s disease
    • Feature reduction
    • Machine learning
    • Pairwise feature correlation

    ASJC Scopus subject areas

    • Genetics
    • Genetics(clinical)

    Fingerprint

    Dive into the research topics of 'Pairwise correlation analysis of the alzheimer’s disease neuroimaging initiative (Adni) dataset reveals significant feature correlation'. Together they form a unique fingerprint.

    Cite this