Prediction of incident diabetes in the jackson heart study using high-dimensional machine learning

Ramon Casanova, Santiago Saldana, Sean L. Simpson, Mary E. Lacy, Angela R. Subauste, Chad Blackshear, Lynne Wagenknecht, Alain G. Bertoni

Research output: Contribution to journalArticlepeer-review

36 Scopus citations


Statistical models to predict incident diabetes are often based on limited variables. Here w pursued two main goals: 1) investigate the relative performance of a machine learnin method such as Random Forests (RF) for detecting incident diabetes in a high-dimensiona setting defined by a large set of observational data, and 2) uncover potential predictors o diabetes. The Jackson Heart Study collected data at baseline and in two follow-up visit from 5,301 African Americans. We excluded those with baseline diabetes and no follow-up leaving 3,633 individuals for analyses. Over a mean 8-year follow-up, 584 participant developed diabetes. The full RF model evaluated 93 variables including demographic anthropometric, blood biomarker, medical history, and echocardiogram data. We also use RF metrics of variable importance to rank variables according to their contribution to diabete prediction. We implemented other models based on logistic regression and RF wher features were preselected. The RF full model performance was similar (AUC = 0.82) t those more parsimonious models. The top-ranked variables according to RF include hemoglobin A1C, fasting plasma glucose, waist circumference, adiponectin, c-reactive protein triglycerides, leptin, left ventricular mass, high-density lipoprotein cholesterol, an aldosterone. This work shows the potential of RF for incident diabetes prediction while dealin with high-dimensional data.

Original languageEnglish
Article numbere0163942
JournalPLoS ONE
Issue number10
StatePublished - Oct 2016

Bibliographical note

Publisher Copyright:
© 2016 Casanova et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

ASJC Scopus subject areas

  • General


Dive into the research topics of 'Prediction of incident diabetes in the jackson heart study using high-dimensional machine learning'. Together they form a unique fingerprint.

Cite this