DMS/NIGMS 1: Data-driven Ricci Curvatures and Spectral Graph for Machine Learning and Adaptive Virtual Screening

Grants and Contracts Details


A major trend of biological sciences in the 21st century is their transition from quantitative, phenomenological and descriptive to a quantitative, analytical and predictive. Fundamental challenges that hinder the current un- derstanding of biomolecular structure-function relationships, which is the central theme of biological sciences, are their tremendous structural complexity and excessively large datasets. These challenges call for innovative strategies. Modern mathematical methods, such as those based on differential geometry, algebraic topology and graph theory, are able to provide high-level abstractions of biomolecular systems. However, these methods were rarely properly applied to the analysis of massive and diverse biomolecular datasets. PI Nguyen and his collaborators have recently made a paradigm-shift progress on devising modern mathematics for biomolecular data analysis. Speci?cally, PI Nguyen has developed geometric graph theory based methods to win a number of contests in two recent D3R Grand Challenges, a worldwide competition series in computer-aided drug design, which ultimately tests our understanding of the biomolecular world and brings a direct bene?t to human health. The objective of the present project is to develop new spectral graph theory and differential geometry based approaches to revolutionize the current practice in biomolecular data analysis and modeling. First, PIs will introduce for the ?rst time multiscale weighted colored algebraic graphs (spectral graphs) to reduce the struc- tural complexity of biomolecular data. These methods will be tailored for various biological systems, such as protein binding to protein, ligand, DNA and RNA, protein folding stability changes upon mutation, drug toxicity, solvation, solubility, and partition coef?cient. Secondly, PIs will construct low-dimensional element interac- tive manifolds for the ?rst time to properly encode chemical and biological information. These methods will be carefully integrated with advanced machine learning or deep learning algorithms to uncover biomolecu- lar structure-function relationships. Finally, PIs will extensively validate the proposed methods on a variety of datasets, intelligently select optimal biomolecular structures, optimize these mathematical learning strate- gies using parallel and GPU architectures, and develop user-friendly software packages or online servers for researchers who are not formally trained in mathematics or machine learning. Intellectual merit The importance of molecular biology and biophysics needs no introduction. The proposed research addresses grand challenges in understanding biomolecular structure-function from massive datasets. These challenges are tackled through the introduction of new concepts in graph theory and differential geom- etry. This proposal offers innovative approaches to an important area in biomolecular modeling, data analysis, dimensionality reduction and mathematical biology. Broader impact The proposed research is transformative. As the ?rst mathematics based deep learn- ing platform for biomolecular datasets, it will open a new direction and foster similar approaches in biological data analysis. Additionally, new mathematical methods can be applied to other ?elds, such as chemistry and material science. The proposed research has a solid educational component. The project will support the training of graduate and undergraduate students in data analysis, biological modeling, and algorithm devel- opment. The enhancement of curricula from the proposed research is planned as a continuation of the PIs’ teaching-research practice. The new mathematical framework is directly integrated into software packages to ensure extensive usage by the community of researchers throughout biology, computer science, and mathe- matics. Undergraduate/graduate student training plan, curriculum enhancement plan, underrepresented group engagement and outreach plan, result dissemination plan, and industrial/academic consulting service plan are designed to further broaden educational and societal impacts. The PIs are committed to an integrated program of research and education.
Effective start/end date8/1/237/31/26


  • National Science Foundation


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.