A type of generalization error induced by initialization in deep neural networks

05/19/2019
by   Yaoyu Zhang, et al.
0

How different initializations and loss functions affect the learning of a deep neural network (DNN), specifically its generalization error, is an important problem in practice. In this work, focusing on regression problems, we develop a kernel-norm minimization framework for the analysis of DNNs in the kernel regime in which the number of neurons in each hidden layer is sufficiently large (Jacot et al. 2018, Lee et al. 2019). We find that, in the kernel regime, for any loss in a general class of functions, e.g., any Lp loss for 1 < p < ∞, the DNN finds the same global minima-the one that is nearest to the initial value in the parameter space, or equivalently, the one that is closest to the initial DNN output in the corresponding reproducing kernel Hilbert space. With this framework, we prove that a non-zero initial output increases the generalization error of DNN. We further propose an antisymmetrical initialization (ASI) trick that eliminates this type of error and accelerates the training. We also demonstrate experimentally that even for DNNs in the non-kernel regime, our theoretical analysis and the ASI trick remain effective. Overall, our work provides insight into how initialization and loss function quantitatively affect the generalization of DNNs, and also provides guidance for the training of DNNs.

READ FULL TEXT
research
02/04/2019

A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks

Empirical studies show that gradient based methods can learn deep neural...
research
09/15/2022

Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization)

We study the average robustness notion in deep neural networks in (selec...
research
06/29/2018

Theory IIIb: Generalization in Deep Networks

A main puzzle of deep neural networks (DNNs) revolves around the apparen...
research
10/05/2021

On the Impact of Stable Ranks in Deep Nets

A recent line of work has established intriguing connections between the...
research
08/07/2023

G-Mix: A Generalized Mixup Learning Framework Towards Flat Minima

Deep neural networks (DNNs) have demonstrated promising results in vario...
research
05/30/2023

Benign Overfitting in Deep Neural Networks under Lazy Training

This paper focuses on over-parameterized deep neural networks (DNNs) wit...
research
11/03/2016

Demystifying ResNet

The Residual Network (ResNet), proposed in He et al. (2015), utilized sh...

Please sign up or login with your details

Forgot password? Click here to reset