Convergence Guarantees for Stochastic Subgradient Methods in Nonsmooth Nonconvex Optimization

07/19/2023
by   Nachuan Xiao, et al.
0

In this paper, we investigate the convergence properties of the stochastic gradient descent (SGD) method and its variants, especially in training neural networks built from nonsmooth activation functions. We develop a novel framework that assigns different timescales to stepsizes for updating the momentum terms and variables, respectively. Under mild conditions, we prove the global convergence of our proposed framework in both single-timescale and two-timescale cases. We show that our proposed framework encompasses a wide range of well-known SGD-type methods, including heavy-ball SGD, SignSGD, Lion, normalized SGD and clipped SGD. Furthermore, when the objective function adopts a finite-sum formulation, we prove the convergence properties for these SGD-type methods based on our proposed framework. In particular, we prove that these SGD-type methods find the Clarke stationary points of the objective function with randomly chosen stepsizes and initial points under mild assumptions. Preliminary numerical experiments demonstrate the high efficiency of our analyzed SGD-type methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/06/2023

Adam-family Methods for Nonsmooth Optimization with Convergence Guarantees

In this paper, we present a comprehensive study on the convergence prope...
research
08/12/2015

On the Convergence of SGD Training of Neural Networks

Neural networks are usually trained by some form of stochastic gradient ...
research
12/01/2020

Convergence of Gradient Algorithms for Nonconvex C^1+α Cost Functions

This paper is concerned with convergence of stochastic gradient algorith...
research
05/04/2021

Stochastic gradient descent with noise of machine learning type. Part I: Discrete time analysis

Stochastic gradient descent (SGD) is one of the most popular algorithms ...
research
06/04/2020

Towards Asymptotic Optimality with Conditioned Stochastic Gradient Descent

In this paper, we investigate a general class of stochastic gradient des...
research
07/03/2020

Weak error analysis for stochastic gradient descent optimization algorithms

Stochastic gradient descent (SGD) type optimization schemes are fundamen...
research
12/03/2015

Kalman-based Stochastic Gradient Method with Stop Condition and Insensitivity to Conditioning

Modern proximal and stochastic gradient descent (SGD) methods are believ...

Please sign up or login with your details

Forgot password? Click here to reset