md_harmonize: A Python Package for Atom-Level Harmonization of Public Metabolic Databases

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

A major challenge to integrating public metabolic resources is the use of different nomenclatures by individual databases. This paper presents md_harmonize, an open-source Python package for harmonizing compounds and metabolic reactions across various metabolic databases. The md_harmonize package utilizes a neighborhood-specific graph coloring method for generating a unique identifier for each compound via atom identifiers based on a compound’s chemical structure. The resulting harmonized compounds and reactions can be used for various downstream analyses, including the construction of atom-resolved metabolic networks and models for metabolic flux analysis. Parts of the md_harmonize package have been optimized using a variety of computational techniques to allow certain NP-complete problems handled by the software to be tractable for these specific use-cases. The software is available on GitHub and through the Python Package Index, with end-user documentation hosted on GitHub Pages.

Original languageEnglish
Article number1199
JournalMetabolites
Volume13
Issue number12
DOIs
StatePublished - Dec 2023

Bibliographical note

Publisher Copyright:
© 2023 by the authors.

Funding

The research was funded by the United States National Science Foundation (NSF), grant number 2020026.

FundersFunder number
U.S. Department of Energy Chinese Academy of Sciences Guangzhou Municipal Science and Technology Project Oak Ridge National Laboratory Extreme Science and Engineering Discovery Environment National Science Foundation National Energy Research Scientific Computing Center National Natural Science Foundation of China2020026
U.S. Department of Energy Chinese Academy of Sciences Guangzhou Municipal Science and Technology Project Oak Ridge National Laboratory Extreme Science and Engineering Discovery Environment National Science Foundation National Energy Research Scientific Computing Center National Natural Science Foundation of China

    Keywords

    • Python package
    • database harmonization
    • maximum common substructure
    • metabolite

    ASJC Scopus subject areas

    • Endocrinology, Diabetes and Metabolism
    • Biochemistry
    • Molecular Biology

    Fingerprint

    Dive into the research topics of 'md_harmonize: A Python Package for Atom-Level Harmonization of Public Metabolic Databases'. Together they form a unique fingerprint.

    Cite this