Positively Scale-Invariant Flatness of ReLU Neural Networks

by   Mingyang Yi, et al.

It was empirically confirmed by Keskar et al.SharpMinima that flatter minima generalize better. However, for the popular ReLU network, sharp minimum can also generalize well SharpMinimacan. The conclusion demonstrates that the existing definitions of flatness fail to account for the complex geometry of ReLU neural networks because they can't cover the Positively Scale-Invariant (PSI) property of ReLU network. In this paper, we formalize the PSI causes problem of existing definitions of flatness and propose a new description of flatness - PSI-flatness. PSI-flatness is defined on the values of basis paths GSGD instead of weights. Values of basis paths have been shown to be the PSI-variables and can sufficiently represent the ReLU neural networks which ensure the PSI property of PSI-flatness. Then we study the relation between PSI-flatness and generalization theoretically and empirically. First, we formulate a generalization bound based on PSI-flatness which shows generalization error decreasing with the ratio between the largest basis path value and the smallest basis path value. That is to say, the minimum with balanced values of basis paths will more likely to be flatter and generalize better. Finally. we visualize the PSI-flatness of loss surface around two learned models which indicates the minimum with smaller PSI-flatness can indeed generalize better.


page 1

page 2

page 3

page 4


Capacity Control of ReLU Neural Networks by Basis-path Norm

Recently, path norm was proposed as a new capacity measure for neural ne...

Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks

Despite existing work on ensuring generalization of neural networks in t...

BN-invariant sharpness regularizes the training model to better generalization

It is arguably believed that flatter minima can generalize better. Howev...

Maximum-and-Concatenation Networks

While successful in many fields, deep neural networks (DNNs) still suffe...

Improved Generalization Bound of Permutation Invariant Deep Neural Networks

We theoretically prove that a permutation invariant property of deep neu...

A multivariate Riesz basis of ReLU neural networks

We consider the trigonometric-like system of piecewise linear functions ...

A modern look at the relationship between sharpness and generalization

Sharpness of minima is a promising quantity that can correlate with gene...

Please sign up or login with your details

Forgot password? Click here to reset