On Connectivity of Solutions in Deep Learning: The Role of Over-parameterization and Feature Quality

02/18/2021
by   Quynh Nguyen, et al.
12

It has been empirically observed that, in deep neural networks, the solutions found by stochastic gradient descent from different random initializations can be often connected by a path with low loss. Recent works have shed light on this intriguing phenomenon by assuming either the over-parameterization of the network or the dropout stability of the solutions. In this paper, we reconcile these two views and present a novel condition for ensuring the connectivity of two arbitrary points in parameter space. This condition is provably milder than dropout stability, and it provides a connection between the problem of finding low-loss paths and the memorization capacity of neural nets. This last point brings about a trade-off between the quality of features at each layer and the over-parameterization of the network. As an extreme example of this trade-off, we show that (i) if subsets of features at each layer are linearly separable, then almost no over-parameterization is needed, and (ii) under generic assumptions on the features at each layer, it suffices that the last two hidden layers have Ω(√(N)) neurons, N being the number of samples. Finally, we provide experimental evidence demonstrating that the presented condition is satisfied in practical settings even when dropout stability does not hold.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2019

Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks

The optimization of multilayer neural networks typically leads to a solu...
research
06/14/2019

Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets

Mode connectivity is a surprising phenomenon in the loss landscape of de...
research
10/30/2019

On the Regularization Properties of Structured Dropout

Dropout and its extensions (eg. DropBlock and DropConnect) are popular h...
research
05/26/2023

Generalization Guarantees of Gradient Descent for Multi-Layer Neural Networks

Recently, significant progress has been made in understanding the genera...
research
05/20/2022

Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization

The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provid...
research
02/18/2020

Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology

A recent line of research has provided convergence guarantees for gradie...
research
02/03/2023

Beyond the Universal Law of Robustness: Sharper Laws for Random Features and Neural Tangent Kernels

Machine learning models are vulnerable to adversarial perturbations, and...

Please sign up or login with your details

Forgot password? Click here to reset