Understanding Why Neural Networks Generalize Well Through GSNR of Parameters

01/21/2020
by   Jinlong Liu, et al.
8

As deep neural networks (DNNs) achieve tremendous success across many application domains, researchers tried to explore in many aspects on why they generalize well. In this paper, we provide a novel perspective on these issues using the gradient signal to noise ratio (GSNR) of parameters during training process of DNNs. The GSNR of a parameter is defined as the ratio between its gradient's squared mean and variance, over the data distribution. Based on several approximations, we establish a quantitative relationship between model parameters' GSNR and the generalization gap. This relationship indicates that larger GSNR during training process leads to better generalization performance. Moreover, we show that, different from that of shallow models (e.g. logistic regression, support vector machines), the gradient descent optimization dynamics of DNNs naturally produces large GSNR during training, which is probably the key to DNNs' remarkable generalization ability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2017

Normalized Direction-preserving Adam

Optimization algorithms for training deep models not only affects the co...
research
01/04/2021

Frequency Principle in Deep Learning Beyond Gradient-descent-based Training

Frequency perspective recently makes progress in understanding deep lear...
research
05/24/2019

Explicitizing an Implicit Bias of the Frequency Principle in Two-layer Neural Networks

It remains a puzzle that why deep neural networks (DNNs), with more para...
research
03/17/2022

Confidence Dimension for Deep Learning based on Hoeffding Inequality and Relative Evaluation

Research on the generalization ability of deep neural networks (DNNs) ha...
research
01/17/2020

DNNs as Layers of Cooperating Classifiers

A robust theoretical framework that can describe and predict the general...
research
11/14/2019

Adversarial Margin Maximization Networks

The tremendous recent success of deep neural networks (DNNs) has sparked...
research
03/24/2022

On Exploiting Layerwise Gradient Statistics for Effective Training of Deep Neural Networks

Adam and AdaBelief compute and make use of elementwise adaptive stepsize...

Please sign up or login with your details

Forgot password? Click here to reset