Breaking Time Invariance: Assorted-Time Normalization for RNNs

Cole Pospisil, Vasily Zadorozhnyy, Qiang Ye

Research output: Contribution to journalArticlepeer-review

Abstract

Methods such as Layer Normalization (LN) and Batch Normalization have proven to be effective in improving the training of Recurrent Neural Networks (RNNs). However, existing methods normalize using only the instantaneous information at one particular time step, and the result of the normalization is a preactivation state with a time-independent distribution. This implementation fails to account for certain temporal differences inherent in the inputs and the architecture of RNNs. Since these networks share weights across time steps, it may also be desirable to account for the connections between time steps in the normalization scheme. In this paper, we propose a normalization method called Assorted-Time Normalization (ATN), which preserves information from multiple consecutive time steps and normalizes using them. This setup allows us to introduce longer time dependencies into the traditional normalization methods without introducing any new trainable parameters. We present theoretical derivations for the gradient propagation and prove the weight scaling invariance property. Our experiments applying ATN to LN demonstrate consistent improvement on various tasks, such as Adding, Copying, and Denoise Problems and Language Modeling Problems.

Original languageEnglish
Article number78
JournalNeural Processing Letters
Volume56
Issue number2
DOIs
StatePublished - Apr 2024

Bibliographical note

Publisher Copyright:
© The Author(s) 2024.

Keywords

  • ATN
  • LN
  • LSTM
  • Layer Normalization
  • Normalization methods

ASJC Scopus subject areas

  • Software
  • General Neuroscience
  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Breaking Time Invariance: Assorted-Time Normalization for RNNs'. Together they form a unique fingerprint.

Cite this