Which Minimizer Does My Neural Network Converge To?

11/04/2020
by   Manuel Nonnenmacher, et al.
0

The loss surface of an overparameterized neural network (NN) possesses many global minima of zero training error. We explain how common variants of the standard NN training procedure change the minimizer obtained. First, we make explicit how the size of the initialization of a strongly overparameterized NN affects the minimizer and can deteriorate its final test performance. We propose a strategy to limit this effect. Then, we demonstrate that for adaptive optimization such as AdaGrad, the obtained minimizer generally differs from the gradient descent (GD) minimizer. This adaptive minimizer is changed further by stochastic mini-batch training, even though in the non-adaptive case GD and stochastic GD result in essentially the same minimizer. Lastly, we explain that these effects remain relevant for less overparameterized NNs. While overparameterization has its benefits, our work highlights that it induces sources of error absent from underparameterized models, some of which can be challenging to control.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/22/2021

Weighted Neural Tangent Kernel: A Generalized and Improved Network-Induced Kernel

The Neural Tangent Kernel (NTK) has recently attracted intense study, as...
research
06/16/2021

Input Invex Neural Network

In this paper, we present a novel method to constrain invexity on Neural...
research
06/12/2014

A Cascade Neural Network Architecture investigating Surface Plasmon Polaritons propagation for thin metals in OpenMP

Surface plasmon polaritons (SPPs) confined along metal-dielectric interf...
research
03/08/2021

Nondeterminism and Instability in Neural Network Optimization

Nondeterminism in neural network optimization produces uncertainty in pe...
research
05/03/2023

Select without Fear: Almost All Mini-Batch Schedules Generalize Optimally

We establish matching upper and lower generalization error bounds for mi...
research
02/25/2019

Modularity as a Means for Complexity Management in Neural Networks Learning

Training a Neural Network (NN) with lots of parameters or intricate arch...

Please sign up or login with your details

Forgot password? Click here to reset