Stochastic dual averaging methods using variance reduction techniques for regularized empirical risk minimization problems

03/08/2016
by   Tomoya Murata, et al.
0

We consider a composite convex minimization problem associated with regularized empirical risk minimization, which often arises in machine learning. We propose two new stochastic gradient methods that are based on stochastic dual averaging method with variance reduction. Our methods generate a sparser solution than the existing methods because we do not need to take the average of the history of the solutions. This is favorable in terms of both interpretability and generalization. Moreover, our methods have theoretical support for both a strongly and a non-strongly convex regularizer and achieve the best known convergence rates among existing nonaccelerated stochastic gradient methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2017

Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization

In this paper, we develop a new accelerated stochastic gradient method f...
research
11/03/2018

Stochastic Primal-Dual Method for Empirical Risk Minimization with O(1) Per-Iteration Complexity

Regularized empirical risk minimization problem with linear predictor ap...
research
01/21/2020

SA vs SAA for population Wasserstein barycenter calculation

In Machine Learning and Optimization community there are two main approa...
research
02/25/2016

Fast Nonsmooth Regularized Risk Minimization with Continuation

In regularized risk minimization, the associated optimization problem be...
research
02/05/2016

Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters

The amount of data available in the world is growing faster than our abi...
research
02/22/2018

Iterate averaging as regularization for stochastic gradient descent

We propose and analyze a variant of the classic Polyak-Ruppert averaging...
research
02/25/2020

Can speed up the convergence rate of stochastic gradient methods to O(1/k^2) by a gradient averaging strategy?

In this paper we consider the question of whether it is possible to appl...

Please sign up or login with your details

Forgot password? Click here to reset