A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization

by   Zhize Li, et al.
Tsinghua University

We analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth finite-sum problems. In particular, the objective function is given by the summation of a differentiable (possibly nonconvex) component, together with a possibly non-differentiable but convex component. We propose a proximal stochastic gradient algorithm based on variance reduction, called ProxSVRG+. The algorithm is a slight variant of the ProxSVRG algorithm [Reddi et al., 2016b]. Our main contribution lies in the analysis of ProxSVRG+. It recovers several existing convergence results (in terms of the number of stochastic gradient oracle calls and proximal operations), and improves/generalizes some others. In particular, ProxSVRG+ generalizes the best results given by the SCSG algorithm, recently proposed by [Lei et al., 2017] for the smooth nonconvex case. ProxSVRG+ is more straightforward than SCSG and yields simpler analysis. Moreover, ProxSVRG+ outperforms the deterministic proximal gradient descent (ProxGD) for a wide range of minibatch sizes, which partially solves an open problem proposed in [Reddi et al., 2016b]. Finally, for nonconvex functions satisfied Polyak-Łojasiewicz condition, we show that ProxSVRG+ achieves global linear convergence rate without restart. ProxSVRG+ is always no worse than ProxGD and ProxSVRG/SAGA, and sometimes outperforms them (and generalizes the results of SCSG) in this case.


page 1

page 2

page 3

page 4


Simple and Optimal Stochastic Gradient Methods for Nonsmooth Nonconvex Optimization

We propose and analyze several stochastic gradient algorithms for findin...

Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization

Two types of zeroth-order stochastic algorithms have recently been desig...

Convex Optimization with Nonconvex Oracles

In machine learning and optimization, one often wants to minimize a conv...

Analysis of nonsmooth stochastic approximation: the differential inclusion approach

In this paper we address the convergence of stochastic approximation whe...

Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning

The widely used stochastic gradient methods for minimizing nonconvex com...

Stochastic Gradient Langevin Dynamics with Variance Reduction

Stochastic gradient Langevin dynamics (SGLD) has gained the attention of...

Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization

We present a unified framework to analyze the global convergence of Lang...

Please sign up or login with your details

Forgot password? Click here to reset