Grants and Contracts Details
Description
PROJECT SUMMARY (30 lines)
The practical reuse of genomics and transcriptomics datasets is well-demonstrated due to the use of
universal gene identifiers that facilitate the matching of features across these datasets, high feature coverage,
standardized metadata and data deposition formats, and a maturity in deposition quality and consistency.
However, metabolomics datasets are much harder to reuse due to the lack of standardized metabolite
feature identification, heterogeneity in feature coverage, and high variability in deposition quality and
consistency. Therefore, it is much harder to both find relevant metabolomics datasets from repositories
like Metabolomics Workbench (MWbench) and effectively reuse these datasets to generate and/or test
hypotheses. To address these difficulties in reusing metabolomics datasets, the quality of deposited datasets
must be improved. Furthermore, methods that enable the effective harmonization and utilization of MWbench
datasets are needed, especially for large meta-analyses and integrative multi-omics analyses. We are the
developers of the only set of available open-source tools for parsing, generating, validating, and depositing
mwTab formatted repository files. Our experience developing the open-source mwtab and MESSES Python
packages makes us uniquely qualified to develop methods to improve the FAIRness, especially Reusability, of
MWbench datasets. Also, we have provided periodic feedback to MWbench based on systematic evaluations of
the repository to enable the improvement of this growing public resource (2). Therefore, we propose to implement
new methods and open-source tools that will improve the FAIRness of MWbench datasets through the following
specific aims: Aim 1: Implement comprehensive repair, cleaning, and data harmonization tools for Metabolomics
Workbench; Aim 2: Disseminate repaired, cleaned, and harmonized Metabolomics Workbench datasets to the
broader biomedical community. The major innovations from this proposal are: i) advanced data repair and
cleaning methods to recover unusable MWbench datasets, ii) effective data harmonization that enable multi-
dataset analyses, iii) advanced search methods for finding relevant MWbench datasets, iv) effective
dissemination of open-source tools with application programming interfaces (APIs) and command line interfaces
(CLIs), implemented at high industrial coding and package management and delivery standards, and v) a light-
weight, easy to maintain, website (meta-repository) for cleaned dataset dissemination through a web user
interface (WebUI). The significance and overall impact of this proposal comes from implementing
effective open-source tools that: a) comprehensively repair, clean, and harmonize MWbench datasets, greatly
improving the quality of MWbench datasets, especially for large-scale reuse and b) broadly disseminate these
greatly improved datasets through a combination of API, CLI, and WebUI methods. This combination of open-
source tools and open web infrastructure will enhance the overall utility and FAIRness of Metabolomics
Workbench, especially reuse, enabling large-scale meta-analyses and multi-omics data integrations.
| Status | Active |
|---|---|
| Effective start/end date | 9/27/25 → 8/31/27 |
Funding
- National Library of Medicine: $462,000.00
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.