Collaborative Research: ACED: Developing Consistency Model-Based Ultra-Long Stride Molecular Simulation to Unravel Long-Time Dynamics of Proteins (ALN 47.070)

Grants and Contracts Details

Description

Proteins are the key molecules that play an essential role in many biological and engineering processes. Many proteins must fold into specific 3D structures to perform their functions. The recent development of deep learning tools such as AlphaFold2 can accurately predict proteins'' plausible folded 3D structures. Nevertheless, understanding the dynamics of proteins is equally important to gaining a folded structure. Many crucial changes in proteins happen over extended periods, ranging from milliseconds to seconds or even longer, beyond the time scope typical molecular dynamics simulations can reach. Key examples include large conformational changes, protein-ligand interactions, allosteric regulation, protein-protein interactions, molecular motors, signal transduction, and protein aggregation. These processes are essential for protein function and are involved in various cellular activities and diseases. We must develop methods to enable long-stride MD simulations to enable research on these problems. The recent development of consistency models sparks our research idea. The consistency model was developed as an alternative to the diffusion model. The principle of the diffusion model is similar to that of MD simulations: the system moves a small step every time. Thus, a well-trained diffusion model should be able to predict the slight variation of protein 3D structure in each MD simulation step. Unlike the diffusion model, the consistency model can predict the variation of the system at any time regarding the initial conditions. In other words, a well-trained consistency model should be able to predict the protein 3D structure at any time after the initial condition. We propose to leverage the ability of the consistency model to develop a deep learning-based long- stride MD simulation engine. A consistency model provides a new approach for generating images in one step, as opposed to the iterative process used in diffusion models. Compared to diffusion models, consistency models can generate images significantly faster. In our case, we can treat a protein distance matrix among backbone atoms as an image to apply this method. The research will be conducted based on two hypotheses: (a) a well-trained diffusion model can predict the step-by-step slight change of protein 3D structure in one short MD simulation, and (b) we can develop the consistency model to predict the variation in protein 3D structure at a long-time step. Driven by the two hypotheses, the proposed research will include two tasks: (a) develop diffusion models to predict variation of protein 3D structure with small time intervals, and (b) develop consistency models to predict variation of protein 3D structure with large time intervals. The proposed research will be conducted by the Shao group at the University of Kentucky and the Xu group at the University of Missouri. The Shao group has expertise in protein modeling and the Xu group has expertise in deep learning. The two groups have existing collaborations on protein language models. We will use model protein systems, such as polyalanine chains, polyglycine chains, and small proteins, such as Trpcage, to illustrate the performance of the developed models. The proposed research will also explore suitable encoders and representations for protein 3D structures, the suitable architecture of diffusion and consistency models, and generate preliminary data for future proposal applications. The expected outcome includes (a) the consistency models that can predict the variation in protein 3D structure in a long-time step (such as hundreds of fs), and (b) a framework to predict the protein fold path and other long-time events based on the development and deployment of the consistency model.
StatusActive
Effective start/end date8/1/257/31/27

Funding

  • National Science Foundation: $250,000.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.