Dynamic temporal residual network for sequence modeling

Ruijie Yan, Liangrui Peng, Shanyu Xiao, Michael T. Johnson, Shengjin Wang

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

The long short-term memory (LSTM) network with gating mechanism has been widely used in sequence modeling tasks including handwriting and speech recognition. As an LSTM network can be unfolded along the temporal dimension and its temporal depth is equal to the length of the input feature sequence, the introduction of gating might not be sufficient to completely model the dynamic temporal dependencies in sequential data. Inspired by the residual learning in ResNet, this paper proposes a dynamic temporal residual network (DTRN) by incorporating residual learning into an LSTM network along the temporal dimension. DTRN involves two networks: Its primary network consists of modified LSTM units with weighted shortcut connections for adjacent temporal outputs, while its secondary network generates dynamic weights for the shortcut connections. To validate the performance of DTRN, we conduct experiments on three commonly used public handwriting recognition datasets (IFN/ENIT, IAM and Rimes) and one speech recognition dataset (TIMIT). The experimental results show that the proposed DTRN has outperformed previously reported methods.

Original languageEnglish
Pages (from-to)235-246
Number of pages12
JournalInternational Journal on Document Analysis and Recognition
Volume22
Issue number3
DOIs
StatePublished - Sep 1 2019

Bibliographical note

Publisher Copyright:
© 2019, Springer-Verlag GmbH Germany, part of Springer Nature.

Keywords

  • Long short-term memory
  • Off-line handwriting recognition
  • Residual learning
  • Speech recognition

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Dynamic temporal residual network for sequence modeling'. Together they form a unique fingerprint.

Cite this