Skip to main navigation Skip to search Skip to main content

Data storage architectures to accelerate chemical discovery: data accessibility for individual laboratories and the community

Research output: Contribution to journalReview articlepeer-review

19 Scopus citations

Abstract

As buzzwords like “big data,” “machine learning,” and “high-throughput” expand through chemistry, chemists need to consider more than ever their data storage, data management, and data accessibility, whether in their own laboratories or with the broader community. While it is commonplace for chemists to use spreadsheets for data storage and analysis, a move towards database architectures ensures that the data can be more readily findable, accessible, interoperable, and reusable (FAIR). However, making this move has several challenges for those with limited-to-no knowledge of computer programming and databases. This Perspective presents basics of data management using databases with a focus on chemical data. We overview database fundamentals by exploring benefits of database use, introducing terminology, and establishing database design principles. We then detail the extract, transform, and load process for database construction, which includes an overview of data parsing and database architectures, spanning Standard Query Language (SQL) and No-SQL structures. We close by cataloging overarching challenges in database design. This Perspective is accompanied by an interactive demonstration available at https://github.com/D3TaLES/databases_demo. We do all of this within the context of chemical data with the aim of equipping chemists with the knowledge and skills to store, manage, and share their data while abiding by FAIR principles.

Original languageEnglish
Pages (from-to)13646-13656
Number of pages11
JournalChemical Science
Volume13
Issue number46
DOIs
StatePublished - Nov 8 2022

Bibliographical note

Publisher Copyright:
© 2022 The Royal Society of Chemistry.

Funding

This work was sponsored by the National Science Foundation in part through the Established Program to Stimulate Competitive Research (EPSCoR) Track 2 program under cooperative agreement number 2019574 and the Designing Materials to Revolutionize and Engineer our Future (NSF DMREF) program under award number DMR-1627428. We acknowledge the University of Kentucky Center for Computational Sciences and Information Technology Services Research Computing for their fantastic support and collaboration and use of the Lipscomb Compute Cluster and associated research computing resources.

FundersFunder number
University of Kentucky Medical Center
National Science Foundation (NSF)
Office of Experimental Program to Stimulate Competitive Research2019574, DMR-1627428

    ASJC Scopus subject areas

    • General Chemistry

    Fingerprint

    Dive into the research topics of 'Data storage architectures to accelerate chemical discovery: data accessibility for individual laboratories and the community'. Together they form a unique fingerprint.

    Cite this