Improving Deposition Quality and FAIRness of Metabolomics Workbench

Grants and Contracts Details


SPECIFIC AIMS We will accomplish the proposed methods and tools development through the following specific aims: Specific Aim 1: Enable comprehensive capture, deposition, and validation of metabolomics experimental data and metadata. Starting from our mwtab Python library and other related open source packages (2-4), we will develop methods for the comprehensive capture of data and metadata from both unstructured formats typically used in many analytical and wet labs as well as organized data schemas in current laboratory information management systems (LIMS). As part of this development, these new methods will integrate data and metadata that are conformant to new metabolomics metadata standards being developed by multiple metabolomics standardization groups. Specifically, we headed an internationally team that developed an extension to the IUPAC International Chemical Identifier (InChI) that enables the use of InChI for the unambiguous annotation of metabolite assignment to spectral features that is programmatically accessible. We also developed extensions to the mwTab format that enable a more comprehensive capture of spectral and metabolite metadata. Based on our libraries and others that our collaborators have developed to access ALL major public repositories of metabolomics data, we will integrate these methods into new tools that comprehensively capture, validate, and convert (meta)data into native mwTab format amenable for quick public deposition of metadata-rich datasets into MWbench. We will also provide extensive documentation of the libraries as well as their implemented use as a blueprint for other metabolomics tool developers to follow. Specific Aim 2: Improve FAIRness of Metabolomics Workbench. Starting from our current open source codebases, we will develop programmatic and command line methods that enable advanced utilization of the MWbench. Specifically, we are developing methods that will: i) download all relevant MWbench entries based on advanced search criteria through MWbench’s RESTful interface, ii) validate metadata quality in MWbench studies at distinct reusability levels, iii) harmonize field names across MWbench studies to enable meta-analyses across compatible sets of MWbench studies, v) allow experimentalists to utilize relevant sets of MWbench studies for evaluating assignments in new metabolomics datasets, and vi) evaluate error structure in MWbench studies to enable integration with other omics datasets. We will test and validate these new methods by integrating relevant human MWbench studies with Genotype-Tissue Expression (GTEx) datasets to derive gene-metabolite associations that will be further validated against known gene-metabolite associations in publicly-available metabolic network databases. The major innovations that this proposal will develop are: i) effective metadata capture methods from unstructured formats, ii) effective harmonization methods for MWbench studies, iii) new error analysis methods that facilitate omics integration, and iv) new tools that facilitate public deposition to a designated metadata quality, with InChI tags, and in mwTab format for quicker and easier deposition. The significance of this proposal is in developing tools that: a) comprehensively capture, validate, deposit, and facilitate reuse of metadata-rich metabolomics data, b) improve the FAIRness of the MWbench, and c) enables integration of MWbench studies with other omics repositories. These new tools will greatly enhance the utility and usage of Metabolomics Workbench.
Effective start/end date9/18/208/31/22


  • Office of the Director: $302,804.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.