kegg_pull: a software package for the RESTful access and pulling from the Kyoto Encyclopedia of Gene and Genomes

Research output: Contribution to journalArticlepeer-review

15 Scopus citations

Abstract

Background: The Kyoto Encyclopedia of Genes and Genomes (KEGG) provides organized genomic, biomolecular, and metabolic information and knowledge that is reasonably current and highly useful for a wide range of analyses and modeling. KEGG follows the principles of data stewardship to be findable, accessible, interoperable, and reusable (FAIR) by providing RESTful access to their database entries via their web-accessible KEGG API. However, the overall FAIRness of KEGG is often limited by the library and software package support available in a given programming language. While R library support for KEGG is fairly strong, Python library support has been lacking. Moreover, there is no software that provides extensive command line level support for KEGG access and utilization. Results: We present kegg_pull, a package implemented in the Python programming language that provides better KEGG access and utilization functionality than previous libraries and software packages. Not only does kegg_pull include an application programming interface (API) for Python programming, it also provides a command line interface (CLI) that enables utilization of KEGG for a wide range of shell scripting and data analysis pipeline use-cases. As kegg_pull’s name implies, both the API and CLI provide versatile options for pulling (downloading and saving) an arbitrary (user defined) number of database entries from the KEGG API. Moreover, this functionality is implemented to efficiently utilize multiple central processing unit cores as demonstrated in several performance tests. Many options are provided to optimize fault-tolerant performance across a single or multiple processes, with recommendations provided based on extensive testing and practical network considerations. Conclusions: The new kegg_pull package enables new flexible KEGG retrieval use cases not available in previous software packages. The most notable new feature that kegg_pull provides is its ability to robustly pull an arbitrary number of KEGG entries with a single API method or CLI command, including pulling an entire KEGG database. We provide recommendations to users for the most effective use of kegg_pull according to their network and computational circumstances.

Original languageEnglish
Article number78
JournalBMC Bioinformatics
Volume24
Issue number1
DOIs
StatePublished - Dec 2023

Bibliographical note

Publisher Copyright:
© 2023, The Author(s).

Funding

EH and HNBM created the objected oriented design in multiple prototype-redesign cycles. EH implemented the software, automated unit and integrative testing, automated end-user documentation generation, and automated package distribution via PyPI. EH wrote the package documentation. HNBM reviewed both the implementation and package documentation. EH wrote the initial draft of the manuscript. HNBM and EH revised the manuscript in multiple revision rounds. HNBM provided support via funded grants. Both authors read and approved the final manuscript.

FundersFunder number
National Science Foundation Arctic Social Science Program2020026
National Science Foundation Arctic Social Science Program
National Institutes of Health (NIH)CF R03OD030603
National Institutes of Health (NIH)
Center for Selective C-H Functionalization, National Science Foundation
Center for Hierarchical Manufacturing, National Science Foundation
Office of Extramural Research, National Institutes of Health
Office of Research Infrastructure Programs, National Institutes of Health

    Keywords

    • Application programming interface
    • Command line interface
    • KEGG
    • Python programming language
    • REST

    ASJC Scopus subject areas

    • Structural Biology
    • Biochemistry
    • Molecular Biology
    • Computer Science Applications
    • Applied Mathematics

    Fingerprint

    Dive into the research topics of 'kegg_pull: a software package for the RESTful access and pulling from the Kyoto Encyclopedia of Gene and Genomes'. Together they form a unique fingerprint.

    Cite this