On the regularization of convolutional kernel tensors in neural networks

Pei Chang Guo, Qiang Ye

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Convolutional neural network is an important model in deep learning, where a convolution operation can be represented by a tensor. To avoid exploding/vanishing gradient problems and to improve the generalizability of a neural network, it is desirable to have a convolution operation that nearly preserves the norm, or to have the singular values of the transformation matrix corresponding to the tensor bounded around 1. We propose a penalty function that can constrain the singular values of the transformation matrix to be around 1. We derive an algorithm to carry out the gradient descent minimization of this penalty function in terms of convolution kernel tensors. Numerical examples are presented to demonstrate the effectiveness of the method.

Original languageEnglish
Pages (from-to)1-13
Number of pages13
JournalLinear and Multilinear Algebra
DOIs
StatePublished - 2020

Bibliographical note

Funding Information:
Research supported by the Fundamental Research Funds for the Central Universities [Grant No. 2652019320] and China Scholarship Council. This work was partly done while this author was a visiting scholar at the Department of Mathematics, University of Kentucky, from July 2018 to July 2019. Research supported in part by NSF [Grant Nos. DMS-1821144 and DMS-1620082]. The authors are grateful to Professor Xinguo Liu at Ocean University of China and Professor Beatrice Meini at University of Pisa for their valuable suggestions.

Publisher Copyright:
© 2020, Informa UK Limited, trading as Taylor & Francis Group.

Keywords

  • Penalty function
  • convolutional layers
  • generalizability
  • transformation matrix
  • unstable gradient

ASJC Scopus subject areas

  • Algebra and Number Theory

Fingerprint

Dive into the research topics of 'On the regularization of convolutional kernel tensors in neural networks'. Together they form a unique fingerprint.

Cite this