Ir directamente a la navegación principal Ir directamente a la búsqueda Ir directamente al contenido principal

Systematic Error Removal Using Random Forest for Normalizing Large-Scale Untargeted Lipidomics Data

  • Sili Fan
  • , Tobias Kind
  • , Tomas Cajka
  • , Stanley L. Hazen
  • , W. H.Wilson Tang
  • , Rima Kaddurah-Daouk
  • , Marguerite R. Irvin
  • , Donna K. Arnett
  • , Dinesh K. Barupal
  • , Oliver Fiehn

Producción científica: Articlerevisión exhaustiva

245 Citas (Scopus)

Resumen

Large-scale untargeted lipidomics experiments involve the measurement of hundreds to thousands of samples. Such data sets are usually acquired on one instrument over days or weeks of analysis time. Such extensive data acquisition processes introduce a variety of systematic errors, including batch differences, longitudinal drifts, or even instrument-to-instrument variation. Technical data variance can obscure the true biological signal and hinder biological discoveries. To combat this issue, we present a novel normalization approach based on using quality control pool samples (QC). This method is called systematic error removal using random forest (SERRF) for eliminating the unwanted systematic variations in large sample sets. We compared SERRF with 15 other commonly used normalization methods using six lipidomics data sets from three large cohort studies (832, 1162, and 2696 samples). SERRF reduced the average technical errors for these data sets to 5% relative standard deviation. We conclude that SERRF outperforms other existing methods and can significantly reduce the unwanted systematic variation, revealing biological variance of interest.

Idioma originalEnglish
Páginas (desde-hasta)3590-3596
Número de páginas7
PublicaciónAnalytical Chemistry
Volumen91
N.º5
DOI
EstadoPublished - mar 5 2019

Nota bibliográfica

Publisher Copyright:
© 2019 American Chemical Society.

Financiación

Funding for the “West Coast Metabolomics Center for Compound Identification” was provided by the National Institutes of Health under the award number NIH U2C ES030158 (to O.F.). Additional funding was provided by the American Heart Association grant 15SDG25760020 and NIH U01 HL072524 (to M.R.I.), NIH 7R01HL091357-06 (to R.K.-D.), and NIH HL113452 (to S.L.H.) for biospecimen collection and data acquisitions. We acknowledge the contributions of the Alzheimer’s Disease Neuroimaging Initiative and the Alzheimer’s Disease Metabolomics Consortium in establishing the ADNI1 lipidomics dataset.

FinanciadoresNúmero del financiador
National Institutes of Health (NIH)U2C ES030158
National Institutes of Health (NIH)
American the American Heart Association7R01HL091357-06, U01 HL072524, 15SDG25760020
American the American Heart Association

    ASJC Scopus subject areas

    • Analytical Chemistry

    Huella

    Profundice en los temas de investigación de 'Systematic Error Removal Using Random Forest for Normalizing Large-Scale Untargeted Lipidomics Data'. En conjunto forman una huella única.

    Citar esto