EPScOR: Robust and Reliable Mathematical Models for Biomolecular Data via Differential Geometry and Graph Theory

Grants and Contracts Details


A major trend of biological sciences in the 21st century is their transition from quantitative, phenomenological and descriptive to a quantitative, analytical and predictive. Fundamental challenges that hinder the current un- derstanding of biomolecular structure-function relationships, which is the central theme of biological sciences, are their tremendous structural complexity and excessively large datasets. These challenges call for innovative strategies. Modern mathematical methods, such as those based on differential geometry, algebraic topology and graph theory, are able to provide high-level abstractions of biomolecular systems. However, these methods were rarely properly applied to the analysis of massive and diverse biomolecular datasets. The PI and his collab- orators have recently made a paradigm-shift progress on devising modern mathematics for biomolecular data analysis. Speci?cally, the PI has developed graph theory based methods to win a number of contests in two recent D3R Grand Challenges, a worldwide competition series in computer-aided drug design, which ultimately tests our understanding of the biomolecular world and brings a direct bene?t to human health. The objective of the present project is to develop new spectral graph theory and differential geometry based approaches to revolutionize the current practice in biomolecular data analysis and modeling. First, the PI will introduce for the ?rst time multiscale weighted colored algebraic graphs (spectral graphs) to reduce the structural complexity of biomolecular data. These methods will be tailored for various biological systems, such as protein binding to protein, ligand, DNA and RNA, protein folding stability changes upon mutation, drug toxicity, solvation, solubility, and partition coef?cient. Secondly, the PI will construct low-dimensional element interactive manifolds for the ?rst time to properly encode chemical and biological information. These methods will be carefully integrated with advanced machine learning or deep learning algorithms to uncover biomolecular structure-function relationships. Finally, the PI will extensively validate the proposed methods on a variety of datasets, optimize these mathematical learning strategies using parallel and GPU architectures, and develop user-friendly software packages or online servers for researchers who are not formally trained on mathematics or machine learning. Intellectual merit The importance of molecular biology and biophysics needs no introduction. The proposed research addresses grand challenges in understanding biomolecular structure-function from massive datasets. These challenges are tackled through the introduction of new concepts in graph theory and differential geom- etry. This proposal offers innovative approaches to an important area in biomolecular modeling, data analysis, dimensionality reduction and mathematical biology. Broader impact The proposed research is transformative. As the ?rst mathematics based deep learn- ing platform for biomolecular datasets, it will open a new direction and foster similar approaches in biological data analysis. Additionally, new mathematical methods can be applied to other ?elds, such as chemistry and material science. The proposed research has a solid educational component. The project will support the training of graduate and undergraduate students in data analysis, biological modeling, and algorithm devel- opment. The enhancement of curricula from the proposed research is planned as a continuation of the PIs’ teaching-research practice. The new mathematical framework is directly integrated into software packages to ensure extensive usage by the community of researchers throughout biology, computer science, and mathe- matics. Undergraduate/graduate student training plan, curriculum enhancement plan, underrepresented group engagement and outreach plan, result dissemination plan, and industrial/academic consulting service plan are designed to further broaden educational and societal impacts. The PI is committed to an integrated program of research and education.
Effective start/end date9/15/228/31/25


  • National Science Foundation


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.