Grants and Contracts Details
Description
Robust Preconditioned Gradient Descent Algorithms
for Deep Neural Networks
Deep neural network based machine learning, or deep learning, has enjoyed tremendous
success in many real-world applications and has contributed to the big-data revolution. A
key task in deep learning is optimization (or called training) of neural networks. The high
complexity and nonlinearity of deep neural networks posts great challenges to traditional
optimization algorithms. As a result, several specialized optimization methods based on
gradient descent have been developed within the past decade. While these methods have
signi?cantly improved the landscape of neural network optimization in practice, most
of them are derived with heuristic arguments and their implementations require good
selections for some associated hyper-parameters that may be critical in their performance.
In general, there is a lack of clear understanding of the convergence behavior in these
methods.
This project aims to advance the theory and the state-of-the-art algorithms for neural
network optimizations. We will develop two new classes of optimization algorithms that
are built on the framework of traditional preconditioning or conjugate gradient meth-
ods but incorporate the ideas from some successful specialized deep learning optimizers.
Our proposed algorithms will retain the e?ciency of popular deep learning optimization
methods but enhance their convergence speed, robustness, and theoretical understanding.
We have carried out some preliminary works on a conjugate gradient style adaptive
momentum method and an almost diagonal preconditioning method based on batch nor-
malization. Our testing has shown some very promising results. This project will further
these works along the following three research topics: 1) We propose to develop a new
class of adaptive momentum based methods with analysis to improve robustness and to
eliminate momentum tuning of several existing methods; 2) We propose to fully develop
a batch normalization based preconditioning as a theoretically sound and algorithmically
?exible alternative to BN. 3) We propose to redevelop the current empirically derived im-
plementations of BN for convolution neural network and recurrent neural network using
the preconditioning framework.
Intellectual Merit: A deep neural network typically involves a highly nonlinear function
in a very large number of variables, making second order optimization algorithms such as
Newton’s method ine?cient. The loss function is not only non-convex but also full of very
?at and highly oscillatory regions. E?cient optimization of such functions is an extremely
challenging problem. We propose to exploit special structures of the deep learning prob-
lems within existing theoretical frameworks. Our proposed algorithms will retain some
elements of popular deep learning optimization methods for convergence acceleration but
enhance their robustness and theoretical understanding.
Broader Impacts: Deep learning has been successfully used in a variety of applications
in data science. Computer vision, speech recognition, natural language processing, ?nan-
cial data analysis, and bioinformatics are some examples of research areas that has seen
signi?cant impacts. This project will improve robustness of the optimization algorithms
for neural networks, helping practitioners to more e?ciently apply deep learning models to
their applications. To accelerate dissemination of the research results to the user commu-
nities and promote real-world applications, we plan to share the computer codes derived
in this project in the open source platform GitHub.
The proposed research lies at the interface between mathematics, computer science,
and statistics and provides an ideal setting for research cross-fertilization and collaboration
as well as training of graduate students in interdisciplinary research. In this regard, our
perspective from the numerical analysis point of view will bring fresh ideas and new ap-
proaches to the ?eld. We have collaborated with colleagues in di?erent ?elds in applications
of deep learning models and we will continue pursuing more collaboration opportunities.
—————— The proposed research lies at the interface between mathematics, com-
puter science, and statistics and provides an ideal setting for research cross-fertilization
and collaboration as well as training of graduate students in interdisciplinary research. In
this regard, our perspective from the numerical analysis point of view may bring fresh
ideas and new approaches to the ?eld. We plan to share computer codes derived in this
project in the open source platform GitHub, which will accelerate dissemination of the
research results to the user communities and promote real-world applications.
Status | Active |
---|---|
Effective start/end date | 8/1/22 → 7/31/25 |
Funding
- National Science Foundation: $335,978.00
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.