Restricted Strong Convexity of Deep Learning Models with Smooth Activations

09/29/2022
by   Arindam Banerjee, et al.
0

We consider the problem of optimization of deep learning models with smooth activation functions. While there exist influential results on the problem from the “near initialization” perspective, we shed considerable new light on the problem. In particular, we make two key technical contributions for such models with L layers, m width, and σ_0^2 initialization variance. First, for suitable σ_0^2, we establish a O(poly(L)/√(m)) upper bound on the spectral norm of the Hessian of such models, considerably sharpening prior results. Second, we introduce a new analysis of optimization based on Restricted Strong Convexity (RSC) which holds as long as the squared norm of the average gradient of predictors is Ω(poly(L)/√(m)) for the square loss. We also present results for more general losses. The RSC based analysis does not need the “near initialization" perspective and guarantees geometric convergence for gradient descent (GD). To the best of our knowledge, ours is the first result on establishing geometric convergence of GD based on RSC for deep learning models, thus becoming an alternative sufficient condition for convergence that does not depend on the widely-used Neural Tangent Kernel (NTK). We share preliminary experimental results supporting our theoretical advances.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2017

Recovery Guarantees for One-hidden-layer Neural Networks

In this paper, we consider regression problems with one-hidden-layer neu...
research
02/09/2021

When does gradient descent with logistic loss interpolate using deep networks with smoothed ReLU activations?

We establish conditions under which gradient descent applied to fixed-wi...
research
09/14/2015

Dropping Convexity for Faster Semi-definite Optimization

We study the minimization of a convex function f(X) over the set of n× n...
research
06/13/2023

Accelerated Convergence of Nesterov's Momentum for Deep Neural Networks under Partial Strong Convexity

Current state-of-the-art analyses on the convergence of gradient descent...
research
11/11/2019

Stronger Convergence Results for Deep Residual Networks: Network Width Scales Linearly with Training Data Size

Deep neural networks are highly expressive machine learning models with ...
research
06/07/2023

Achieving Consensus over Compact Submanifolds

We consider the consensus problem in a decentralized network, focusing o...
research
11/07/2016

Neural Taylor Approximations: Convergence and Exploration in Rectifier Networks

Modern convolutional networks, incorporating rectifiers and max-pooling,...

Please sign up or login with your details

Forgot password? Click here to reset