kegg_pull: a software package for the RESTful access and pulling from the Kyoto Encyclopedia of Gene and Genomes

Erik Huckvale, Hunter N.B. Moseley

Research output: Contribution to journalArticlepeer-review

Abstract

Background: The Kyoto Encyclopedia of Genes and Genomes (KEGG) provides organized genomic, biomolecular, and metabolic information and knowledge that is reasonably current and highly useful for a wide range of analyses and modeling. KEGG follows the principles of data stewardship to be findable, accessible, interoperable, and reusable (FAIR) by providing RESTful access to their database entries via their web-accessible KEGG API. However, the overall FAIRness of KEGG is often limited by the library and software package support available in a given programming language. While R library support for KEGG is fairly strong, Python library support has been lacking. Moreover, there is no software that provides extensive command line level support for KEGG access and utilization. Results: We present kegg_pull, a package implemented in the Python programming language that provides better KEGG access and utilization functionality than previous libraries and software packages. Not only does kegg_pull include an application programming interface (API) for Python programming, it also provides a command line interface (CLI) that enables utilization of KEGG for a wide range of shell scripting and data analysis pipeline use-cases. As kegg_pull’s name implies, both the API and CLI provide versatile options for pulling (downloading and saving) an arbitrary (user defined) number of database entries from the KEGG API. Moreover, this functionality is implemented to efficiently utilize multiple central processing unit cores as demonstrated in several performance tests. Many options are provided to optimize fault-tolerant performance across a single or multiple processes, with recommendations provided based on extensive testing and practical network considerations. Conclusions: The new kegg_pull package enables new flexible KEGG retrieval use cases not available in previous software packages. The most notable new feature that kegg_pull provides is its ability to robustly pull an arbitrary number of KEGG entries with a single API method or CLI command, including pulling an entire KEGG database. We provide recommendations to users for the most effective use of kegg_pull according to their network and computational circumstances.

Original languageEnglish
Article number78
JournalBMC Bioinformatics
Volume24
Issue number1
DOIs
StatePublished - Dec 2023

Bibliographical note

Funding Information:
EH and HNBM created the objected oriented design in multiple prototype-redesign cycles. EH implemented the software, automated unit and integrative testing, automated end-user documentation generation, and automated package distribution via PyPI. EH wrote the package documentation. HNBM reviewed both the implementation and package documentation. EH wrote the initial draft of the manuscript. HNBM and EH revised the manuscript in multiple revision rounds. HNBM provided support via funded grants. Both authors read and approved the final manuscript.

Publisher Copyright:
© 2023, The Author(s).

Keywords

  • Application programming interface
  • Command line interface
  • KEGG
  • Python programming language
  • REST

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'kegg_pull: a software package for the RESTful access and pulling from the Kyoto Encyclopedia of Gene and Genomes'. Together they form a unique fingerprint.

Cite this