On the Generalization Power of the Overfitted Three-Layer Neural Tangent Kernel Model

06/04/2022
by   Peizhong Ju, et al.
0

In this paper, we study the generalization performance of overparameterized 3-layer NTK models. We show that, for a specific set of ground-truth functions (which we refer to as the "learnable set"), the test error of the overfitted 3-layer NTK is upper bounded by an expression that decreases with the number of neurons of the two hidden layers. Different from 2-layer NTK where there exists only one hidden-layer, the 3-layer NTK involves interactions between two hidden-layers. Our upper bound reveals that, between the two hidden-layers, the test error descends faster with respect to the number of neurons in the second hidden-layer (the one closer to the output) than with respect to that in the first hidden-layer (the one closer to the input). We also show that the learnable set of 3-layer NTK without bias is no smaller than that of 2-layer NTK models with various choices of bias in the neurons. However, in terms of the actual generalization performance, our results suggest that 3-layer NTK is much less sensitive to the choices of bias than 2-layer NTK, especially when the input dimension is large.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2021

On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models

In this paper, we study the generalization performance of min ℓ_2-norm o...
research
10/02/2020

The Efficacy of L_1 Regularization in Two-Layer Neural Networks

A crucial problem in neural networks is to select the most appropriate n...
research
06/10/2021

Within-layer Diversity Reduces Generalization Gap

Neural networks are composed of multiple layers arranged in a hierarchic...
research
09/26/2019

Sequential Training of Neural Networks with Gradient Boosting

This paper presents a novel technique based on gradient boosting to trai...
research
05/31/2022

Feature Learning in L_2-regularized DNNs: Attraction/Repulsion and Sparsity

We study the loss surface of DNNs with L_2 regularization. We show that ...
research
04/01/2023

Hidden Layer Interaction: A Co-Creative Design Fiction for Generative Models

This paper presents a speculation on a fictive co-creation scenario that...
research
07/18/2023

Can Neural Network Memorization Be Localized?

Recent efforts at explaining the interplay of memorization and generaliz...

Please sign up or login with your details

Forgot password? Click here to reset