A Proximal Stochastic Gradient Method with Progressive Variance Reduction

03/19/2014
by   Lin Xiao, et al.
0

We consider the problem of minimizing the sum of two convex functions: one is the average of a large number of smooth component functions, and the other is a general convex function that admits a simple proximal mapping. We assume the whole objective function is strongly convex. Such problems often arise in machine learning, known as regularized empirical risk minimization. We propose and analyze a new proximal stochastic gradient method, which uses a multi-stage scheme to progressively reduce the variance of the stochastic gradient. While each iteration of this algorithm has similar cost as the classical stochastic gradient method (or incremental gradient method), we show that the expected objective value converges to the optimum at a geometric rate. The overall complexity of this method is much lower than both the proximal full gradient method and the standard proximal stochastic gradient method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2016

A Proximal Stochastic Quasi-Newton Algorithm

In this paper, we discuss the problem of minimizing the sum of two conve...
research
12/20/2021

Decentralized Stochastic Proximal Gradient Descent with Variance Reduction over Time-varying Networks

In decentralized learning, a network of nodes cooperate to minimize an o...
research
04/07/2020

Orthant Based Proximal Stochastic Gradient Method for ℓ_1-Regularized Optimization

Sparsity-inducing regularization problems are ubiquitous in machine lear...
research
06/24/2015

Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization

We develop a family of accelerated stochastic algorithms that minimize s...
research
06/14/2022

A Stochastic Proximal Method for Nonsmooth Regularized Finite Sum Optimization

We consider the problem of training a deep neural network with nonsmooth...
research
04/16/2015

Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting

We propose mS2GD: a method incorporating a mini-batching scheme for impr...
research
02/12/2021

Proximal and Federated Random Reshuffling

Random Reshuffling (RR), also known as Stochastic Gradient Descent (SGD)...

Please sign up or login with your details

Forgot password? Click here to reset