Convergence Analysis of Homotopy-SGD for non-convex optimization

11/20/2020
by   Matilde Gargiani, et al.
0

First-order stochastic methods for solving large-scale non-convex optimization problems are widely used in many big-data applications, e.g. training deep neural networks as well as other complex and potentially non-convex machine learning models. Their inexpensive iterations generally come together with slow global convergence rate (mostly sublinear), leading to the necessity of carrying out a very high number of iterations before the iterates reach a neighborhood of a minimizer. In this work, we present a first-order stochastic algorithm based on a combination of homotopy methods and SGD, called Homotopy-Stochastic Gradient Descent (H-SGD), which finds interesting connections with some proposed heuristics in the literature, e.g. optimization by Gaussian continuation, training by diffusion, mollifying networks. Under some mild assumptions on the problem structure, we conduct a theoretical analysis of the proposed algorithm. Our analysis shows that, with a specifically designed scheme for the homotopy parameter, H-SGD enjoys a global linear rate of convergence to a neighborhood of a minimum while maintaining fast and inexpensive iterations. Experimental evaluations confirm the theoretical results and show that H-SGD can outperform standard SGD.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2022

Tackling benign nonconvexity with smoothing and stochastic gradients

Non-convex optimization problems are ubiquitous in machine learning, esp...
research
01/02/2019

SGD Converges to Global Minimum in Deep Learning via Star-convex Path

Stochastic gradient descent (SGD) has been found to be surprisingly effe...
research
07/27/2020

Multi-Level Local SGD for Heterogeneous Hierarchical Networks

We propose Multi-Level Local SGD, a distributed gradient method for lear...
research
04/19/2013

Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections

We consider stochastic strongly convex optimization with a complex inequ...
research
10/06/2022

Scaling up Stochastic Gradient Descent for Non-convex Optimisation

Stochastic gradient descent (SGD) is a widely adopted iterative method f...
research
01/10/2020

Choosing the Sample with Lowest Loss makes SGD Robust

The presence of outliers can potentially significantly skew the paramete...
research
07/20/2018

signProx: One-Bit Proximal Algorithm for Nonconvex Stochastic Optimization

Stochastic gradient descent (SGD) is one of the most widely used optimiz...

Please sign up or login with your details

Forgot password? Click here to reset