Global Convergence of SGD For Logistic Loss on Two Layer Neural Nets

09/17/2023
by   Pulkit Gopalani, et al.
0

In this note, we demonstrate a first-of-its-kind provable convergence of SGD to the global minima of appropriately regularized logistic empirical risk of depth 2 nets – for arbitrary data and with any number of gates with adequately smooth and bounded activations like sigmoid and tanh. We also prove an exponentially fast convergence rate for continuous time SGD that also applies to smooth unbounded activations like SoftPlus. Our key idea is to show the existence of Frobenius norm regularized logistic loss functions on constant-sized neural nets which are "Villani functions" and thus be able to build on recent progress with analyzing SGD on such objectives.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2022

Global Convergence of SGD On Two Layer Neural Nets

In this note we demonstrate provable convergence of SGD to the global mi...
research
01/22/2019

DTN: A Learning Rate Scheme with Convergence Rate of O(1/t) for SGD

We propose a novel diminishing learning rate scheme, coined Decreasing-T...
research
03/27/2020

Piecewise linear activations substantially shape the loss surfaces of neural networks

Understanding the loss surface of a neural network is fundamentally impo...
research
07/07/2020

Understanding the Impact of Model Incoherence on Convergence of Incremental SGD with Random Reshuffle

Although SGD with random reshuffle has been widely-used in machine learn...
research
12/10/2018

Why Does Stagewise Training Accelerate Convergence of Testing Error Over SGD?

Stagewise training strategy is commonly used for learning neural network...
research
09/07/2021

Regularized Learning in Banach Spaces

This article presents a different way to study the theory of regularized...

Please sign up or login with your details

Forgot password? Click here to reset