Abstract
In this paper, we study the generalization performance of min ℓ2-norm overfitting solutions for the neural tangent kernel (NTK) model of a two-layer neural network with ReLU activation that has no bias term. We show that, depending on the ground-truth function, the test error of overfitted NTK models exhibits characteristics that are different from the “double-descent” of other overparameterized linear models with simple Fourier or Gaussian features. Specifically, for a class of learnable functions, we provide a new upper bound of the generalization error that approaches a small limiting value, even when the number of neurons p approaches infinity. This limiting value further decreases with the number of training samples n. For functions outside of this class, we provide a lower bound on the generalization error that does not diminish to zero even when n and p are both large.
Original language | English |
---|---|
Title of host publication | Proceedings of the 38th International Conference on Machine Learning, ICML 2021 |
Pages | 5137-5147 |
Number of pages | 11 |
ISBN (Electronic) | 9781713845065 |
State | Published - 2021 |
Event | 38th International Conference on Machine Learning, ICML 2021 - Virtual, Online Duration: Jul 18 2021 → Jul 24 2021 |
Publication series
Name | Proceedings of Machine Learning Research |
---|---|
Volume | 139 |
ISSN (Electronic) | 2640-3498 |
Conference
Conference | 38th International Conference on Machine Learning, ICML 2021 |
---|---|
City | Virtual, Online |
Period | 7/18/21 → 7/24/21 |
Bibliographical note
Publisher Copyright:Copyright © 2021 by the author(s)
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability