Spurious Local Minima of Shallow ReLU Networks Conform with the Symmetry of the Target Model

12/26/2019
by   Yossi Arjevani, et al.
0

We consider the optimization problem associated with fitting two-layer ReLU networks with respect to the squared loss, where labels are assumed to be generated by a target network. Focusing first on standard Gaussian inputs, we show that the structure of spurious local minima detected by stochastic gradient descent (SGD) is, in a well-defined sense, the least loss of symmetry with respect to the target weights. A closer look at the analysis indicates then that this principle of least symmetry breaking may apply to a broader range of settings. Motivated by this, we conduct a series of experiments which corroborate this hypothesis for different classes of non-isotropic non-product distributions, smooth activation functions and networks with a few layers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2022

Annihilation of Spurious Minima in Two-Layer ReLU Networks

We study the optimization problem associated with fitting two-layer ReLU...
research
12/24/2017

Spurious Local Minima are Common in Two-Layer ReLU Neural Networks

We consider the optimization problem associated with training simple ReL...
research
08/04/2020

Analytic Characterization of the Hessian in Shallow ReLU Models: A Tale of Symmetry

We consider the optimization problem associated with fitting two-layers ...
research
07/21/2021

Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural Networks

We study the optimization problem associated with fitting two-layer ReLU...
research
07/06/2021

Equivariant bifurcation, quadratic equivariants, and symmetry breaking for the standard representation of S_n

Motivated by questions originating from the study of a class of shallow ...
research
04/12/2021

Noether: The More Things Change, the More Stay the Same

Symmetries have proven to be important ingredients in the analysis of ne...
research
06/30/2023

The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks

We study the type of solutions to which stochastic gradient descent conv...

Please sign up or login with your details

Forgot password? Click here to reset