Robust Preconditioned Gradient Descent Algorithms for Deep Learning

Grants and Contracts Details

Description

Robust Preconditioned Gradient Descent Algorithms for Deep Neural Networks Deep neural network based machine learning, or deep learning, has enjoyed tremendous success in many real-world applications and has contributed to the big-data revolution. A key task in deep learning is optimization (or called training) of neural networks. The high complexity and nonlinearity of deep neural networks posts great challenges to traditional optimization algorithms. As a result, several specialized optimization methods based on gradient descent have been developed within the past decade. While these methods have signi?cantly improved the landscape of neural network optimization in practice, most of them are derived with heuristic arguments and their implementations require good selections for some associated hyper-parameters that may be critical in their performance. In general, there is a lack of clear understanding of the convergence behavior in these methods. This project aims to advance the theory and the state-of-the-art algorithms for neural network optimizations. We will develop two new classes of optimization algorithms that are built on the framework of traditional preconditioning or conjugate gradient meth- ods but incorporate the ideas from some successful specialized deep learning optimizers. Our proposed algorithms will retain the e?ciency of popular deep learning optimization methods but enhance their convergence speed, robustness, and theoretical understanding. We have carried out some preliminary works on a conjugate gradient style adaptive momentum method and an almost diagonal preconditioning method based on batch nor- malization. Our testing has shown some very promising results. This project will further these works along the following three research topics: 1) We propose to develop a new class of adaptive momentum based methods with analysis to improve robustness and to eliminate momentum tuning of several existing methods; 2) We propose to fully develop a batch normalization based preconditioning as a theoretically sound and algorithmically ?exible alternative to BN. 3) We propose to redevelop the current empirically derived im- plementations of BN for convolution neural network and recurrent neural network using the preconditioning framework. Intellectual Merit: A deep neural network typically involves a highly nonlinear function in a very large number of variables, making second order optimization algorithms such as Newton’s method ine?cient. The loss function is not only non-convex but also full of very ?at and highly oscillatory regions. E?cient optimization of such functions is an extremely challenging problem. We propose to exploit special structures of the deep learning prob- lems within existing theoretical frameworks. Our proposed algorithms will retain some elements of popular deep learning optimization methods for convergence acceleration but enhance their robustness and theoretical understanding. Broader Impacts: Deep learning has been successfully used in a variety of applications in data science. Computer vision, speech recognition, natural language processing, ?nan- cial data analysis, and bioinformatics are some examples of research areas that has seen signi?cant impacts. This project will improve robustness of the optimization algorithms for neural networks, helping practitioners to more e?ciently apply deep learning models to their applications. To accelerate dissemination of the research results to the user commu- nities and promote real-world applications, we plan to share the computer codes derived in this project in the open source platform GitHub. The proposed research lies at the interface between mathematics, computer science, and statistics and provides an ideal setting for research cross-fertilization and collaboration as well as training of graduate students in interdisciplinary research. In this regard, our perspective from the numerical analysis point of view will bring fresh ideas and new ap- proaches to the ?eld. We have collaborated with colleagues in di?erent ?elds in applications of deep learning models and we will continue pursuing more collaboration opportunities. —————— The proposed research lies at the interface between mathematics, com- puter science, and statistics and provides an ideal setting for research cross-fertilization and collaboration as well as training of graduate students in interdisciplinary research. In this regard, our perspective from the numerical analysis point of view may bring fresh ideas and new approaches to the ?eld. We plan to share computer codes derived in this project in the open source platform GitHub, which will accelerate dissemination of the research results to the user communities and promote real-world applications.
StatusActive
Effective start/end date8/1/227/31/25

Funding

  • National Science Foundation: $335,978.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.