Abstract
In recent years, using orthogonal matrices has been shown to be a promising approach to improving recurrent neural networks (RNNs) with training, stability, and convergence, particularly to control gradients. While gated recurrent unit (GRU) and long short-term memory (LSTM) architectures address the vanishing gradient problem by using a variety of gates and memory cells, they are still prone to the exploding gradient problem. In this work, we analyze the gradients in GRU and propose the use of orthogonal matrices to prevent exploding gradient problems and enhance long-term memory. We study where to use orthogonal matrices and propose a Neumann series–based scaled Cayley transformation for training orthogonal matrices in GRU, which we call Neumann-Cayley orthogonal GRU (NC-GRU). We present detailed experiments of our model on several synthetic and real-world tasks, which show that NC-GRU significantly outperforms GRU and several other RNNs.
| Original language | English |
|---|---|
| Pages (from-to) | 2651-2676 |
| Number of pages | 26 |
| Journal | Neural Computation |
| Volume | 36 |
| Issue number | 12 |
| DOIs | |
| State | Published - Dec 2024 |
Bibliographical note
Publisher Copyright:© 2024 Massachusetts Institute of Technology.
Funding
We thank the University of Kentucky Center for Computational Sciences and Information Technology Services Research Computing for their support and use of the Lipscomb Compute Cluster and associated research computing resources. This research was supported in part by NSF under grants DMS-2053284, DMS-2151802, DMS-2208314, IIS-2327113, and the University of Kentucky Start-up fund.
| Funders | Funder number |
|---|---|
| University of Kentucky | |
| Kentucky Transportation Center, University of Kentucky | |
| National Science Foundation Arctic Social Science Program | DMS-2053284, IIS-2327113, DMS-2208314, DMS-2151802 |
ASJC Scopus subject areas
- Arts and Humanities (miscellaneous)
- Cognitive Neuroscience
Fingerprint
Dive into the research topics of 'Orthogonal Gated Recurrent Unit With Neumann-Cayley Transformation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver