Deriving lipid classification based on molecular formulas

Joshua M. Mitchell, Robert M. Flight, Hunter N.B. Moseley

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

Despite instrument and algorithmic improvements, the untargeted and accurate assignment of metabolites remains an unsolved problem in metabolomics. New assignment methods such as our SMIRFE algorithm can assign elemental molecular formulas to observed spectral features in a highly untargeted manner without orthogonal information from tandem MS or chromatography. However, for many lipidomics applications, it is necessary to know at least the lipid category or class that is associated with a detected spectral feature to derive a biochemical interpretation. Our goal is to develop a method for robustly classifying elemental molecular formula assignments into lipid categories for an application to SMIRFE-generated assignments. Using a Random Forest machine learning approach, we developed a method that can predict lipid category and class from SMIRFE non-adducted molecular formula assignments. Our methods achieve high average predictive accuracy (>90%) and precision (>83%) across all eight of the lipid categories in the LIPIDMAPS database. Classification performance was evaluated using sets of theoretical, data-derived, and artifactual molecular formulas. Our methods enable the lipid classification of non-adducted molecular formula assignments generated by SMIRFE without orthogonal information, facilitating the biochemical interpretation of untargeted lipidomics experiments. This lipid classification appears insufficient for validating single-spectrum assignments, but could be useful in cross-spectrum assignment validation.

Original languageEnglish
Article number122
JournalMetabolites
Volume10
Issue number3
DOIs
StatePublished - Mar 2020

Bibliographical note

Publisher Copyright:
© 2020 by the authors. Licensee MDPI, Basel, Switzerland.

Funding

Funding: This research was supported in part by grants NSF 1419282 (PI Moseley) and NIH UL1TR001998-01 (PI Kern).

FundersFunder number
National Science Foundation (NSF)1419282
National Institutes of Health (NIH)UL1TR001998-01

    Keywords

    • Lipid category
    • Lipidomics
    • Machine learning
    • Metabolomics
    • Random Forest
    • SMIRFE

    ASJC Scopus subject areas

    • Endocrinology, Diabetes and Metabolism
    • Biochemistry
    • Molecular Biology

    Fingerprint

    Dive into the research topics of 'Deriving lipid classification based on molecular formulas'. Together they form a unique fingerprint.

    Cite this