On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models

Peizhong Ju, Xiaojun Lin, Ness B. Shroff

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

In this paper, we study the generalization performance of min ℓ2-norm overfitting solutions for the neural tangent kernel (NTK) model of a two-layer neural network with ReLU activation that has no bias term. We show that, depending on the ground-truth function, the test error of overfitted NTK models exhibits characteristics that are different from the “double-descent” of other overparameterized linear models with simple Fourier or Gaussian features. Specifically, for a class of learnable functions, we provide a new upper bound of the generalization error that approaches a small limiting value, even when the number of neurons p approaches infinity. This limiting value further decreases with the number of training samples n. For functions outside of this class, we provide a lower bound on the generalization error that does not diminish to zero even when n and p are both large.

Original languageEnglish
Title of host publicationProceedings of the 38th International Conference on Machine Learning, ICML 2021
Pages5137-5147
Number of pages11
ISBN (Electronic)9781713845065
StatePublished - 2021
Event38th International Conference on Machine Learning, ICML 2021 - Virtual, Online
Duration: Jul 18 2021Jul 24 2021

Publication series

NameProceedings of Machine Learning Research
Volume139
ISSN (Electronic)2640-3498

Conference

Conference38th International Conference on Machine Learning, ICML 2021
CityVirtual, Online
Period7/18/217/24/21

Bibliographical note

Publisher Copyright:
Copyright © 2021 by the author(s)

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models'. Together they form a unique fingerprint.

Cite this