A review of mathematical representations of biomolecular data

Duc Duy Nguyen, Zixuan Cang, Guo Wei Wei

Research output: Contribution to journalReview articlepeer-review

63 Scopus citations

Abstract

Recently, machine learning (ML) has established itself in various worldwide benchmarking competitions in computational biology, including Critical Assessment of Structure Prediction (CASP) and Drug Design Data Resource (D3R) Grand Challenges. However, the intricate structural complexity and high ML dimensionality of biomolecular datasets obstruct the efficient application of ML algorithms in the field. In addition to data and algorithm, an efficient ML machinery for biomolecular predictions must include structural representation as an indispensable component. Mathematical representations that simplify the biomolecular structural complexity and reduce ML dimensionality have emerged as a prime winner in D3R Grand Challenges. This review is devoted to the recent advances in developing low-dimensional and scalable mathematical representations of biomolecules in our laboratory. We discuss three classes of mathematical approaches, including algebraic topology, differential geometry, and graph theory. We elucidate how the physical and biological challenges have guided the evolution and development of these mathematical apparatuses for massive and diverse biomolecular data. We focus the performance analysis on protein-ligand binding predictions in this review although these methods have had tremendous success in many other applications, such as protein classification, virtual screening, and the predictions of solubility, solvation free energies, toxicity, partition coefficients, protein folding stability changes upon mutation, etc.

Original languageEnglish
Pages (from-to)4343-4367
Number of pages25
JournalPhysical Chemistry Chemical Physics
Volume22
Issue number8
DOIs
StatePublished - Feb 28 2020

Bibliographical note

Publisher Copyright:
This journal is © the Owner Societies.

Funding

This work was supported in part by NSF Grants DMS-1721024, DMS-1761320, and IIS1900473, NIH grants GM126189 and GM129004, Bristol-Myers Squibb, and Pfizer. We thank Dr Kaifu Gao for his contribution to our team’s pose prediction in D3R Grand Challenge 4.

FundersFunder number
National Science Foundation (NSF)IIS1900473, DMS-1721024, DMS-1761320
National Institutes of Health (NIH)GM126189
National Institute of General Medical SciencesR01GM129004
Bristol-Myers Squibb
Pfizer

    ASJC Scopus subject areas

    • General Physics and Astronomy
    • Physical and Theoretical Chemistry

    Fingerprint

    Dive into the research topics of 'A review of mathematical representations of biomolecular data'. Together they form a unique fingerprint.

    Cite this