Tackling benign nonconvexity with smoothing and stochastic gradients

02/18/2022
by   Harsh Vardhan, et al.
0

Non-convex optimization problems are ubiquitous in machine learning, especially in Deep Learning. While such complex problems can often be successfully optimized in practice by using stochastic gradient descent (SGD), theoretical analysis cannot adequately explain this success. In particular, the standard analyses do not show global convergence of SGD on non-convex functions, and instead show convergence to stationary points (which can also be local minima or saddle points). We identify a broad class of nonconvex functions for which we can show that perturbed SGD (gradient descent perturbed by stochastic noise – covering SGD as a special case) converges to a global minimum (or a neighborhood thereof), in contrast to gradient descent without noise that can get stuck in local minima far from a global solution. For example, on non-convex functions that are relatively close to a convex-like (strongly convex or PL) function we show that SGD can converge linearly to a global optimum.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/06/2015

Escaping From Saddle Points --- Online Stochastic Gradient for Tensor Decomposition

We analyze stochastic gradient descent for optimizing non-convex functio...
research
06/13/2022

On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms

Stochastic gradient descent (SGD) algorithm is the method of choice in m...
research
10/04/2021

Global Convergence and Stability of Stochastic Gradient Descent

In machine learning, stochastic gradient descent (SGD) is widely deploye...
research
03/24/2021

Why Do Local Methods Solve Nonconvex Problems?

Non-convex optimization is ubiquitous in modern machine learning. Resear...
research
11/20/2020

Convergence Analysis of Homotopy-SGD for non-convex optimization

First-order stochastic methods for solving large-scale non-convex optimi...
research
02/17/2018

An Alternative View: When Does SGD Escape Local Minima?

Stochastic gradient descent (SGD) is widely used in machine learning. Al...
research
02/13/2023

Convergence analysis for a nonlocal gradient descent method via directional Gaussian smoothing

We analyze the convergence of a nonlocal gradient descent method for min...

Please sign up or login with your details

Forgot password? Click here to reset