Orthogonal Gated Recurrent Unit With Neumann-Cayley Transformation

  • Vasily Zadorozhnyy
  • , Edison Mucllari
  • , Cole Pospisil
  • , Duc Nguyen
  • , Qiang Ye

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

In recent years, using orthogonal matrices has been shown to be a promising approach to improving recurrent neural networks (RNNs) with training, stability, and convergence, particularly to control gradients. While gated recurrent unit (GRU) and long short-term memory (LSTM) architectures address the vanishing gradient problem by using a variety of gates and memory cells, they are still prone to the exploding gradient problem. In this work, we analyze the gradients in GRU and propose the use of orthogonal matrices to prevent exploding gradient problems and enhance long-term memory. We study where to use orthogonal matrices and propose a Neumann series–based scaled Cayley transformation for training orthogonal matrices in GRU, which we call Neumann-Cayley orthogonal GRU (NC-GRU). We present detailed experiments of our model on several synthetic and real-world tasks, which show that NC-GRU significantly outperforms GRU and several other RNNs.

Original languageEnglish
Pages (from-to)2651-2676
Number of pages26
JournalNeural Computation
Volume36
Issue number12
DOIs
StatePublished - Dec 2024

Bibliographical note

Publisher Copyright:
© 2024 Massachusetts Institute of Technology.

Funding

We thank the University of Kentucky Center for Computational Sciences and Information Technology Services Research Computing for their support and use of the Lipscomb Compute Cluster and associated research computing resources. This research was supported in part by NSF under grants DMS-2053284, DMS-2151802, DMS-2208314, IIS-2327113, and the University of Kentucky Start-up fund.

FundersFunder number
University of Kentucky
Kentucky Transportation Center, University of Kentucky
National Science Foundation Arctic Social Science ProgramDMS-2053284, IIS-2327113, DMS-2208314, DMS-2151802

    ASJC Scopus subject areas

    • Arts and Humanities (miscellaneous)
    • Cognitive Neuroscience

    Fingerprint

    Dive into the research topics of 'Orthogonal Gated Recurrent Unit With Neumann-Cayley Transformation'. Together they form a unique fingerprint.

    Cite this