
On Stochastic Variance Reduced Gradient Method for Semidefinite Optimization
The lowrank stochastic semidefinite optimization has attracted rising a...
read it

Don't Jump Through Hoops and Remove Those Loops: SVRG and Katyusha are Better Without the Outer Loop
The stochastic variancereduced gradient method (SVRG) and its accelerat...
read it

Stochastically Controlled Stochastic Gradient for the Convex and Nonconvex Composition problem
In this paper, we consider the convex and nonconvex composition problem...
read it

Almost TuneFree Variance Reduction
The variance reduction class of algorithms including the representative ...
read it

Optimal minibatch and step sizes for SAGA
Recently it has been shown that the step sizes of a family of variance r...
read it

Particle Dual Averaging: Optimization of Mean Field Neural Networks with Global Convergence Rate Analysis
We propose the particle dual averaging (PDA) method, which generalizes t...
read it

TRPL+K: ThickRestart Preconditioned Lanczos+K Method for Large Symmetric Eigenvalue Problems
The Lanczos method is one of the standard approaches for computing a few...
read it
Towards closing the gap between the theory and practice of SVRG
Among the very first variance reduced stochastic methods for solving the empirical risk minimization problem was the SVRG method (Johnson & Zhang 2013). SVRG is an innerouter loop based method, where in the outer loop a reference full gradient is evaluated, after which m ∈N steps of an inner loop are executed where the reference gradient is used to build a variance reduced estimate of the current gradient. The simplicity of the SVRG method and its analysis has lead to multiple extensions and variants for even nonconvex optimization. Yet there is a significant gap between the parameter settings that the analysis suggests and what is known to work well in practice. Our first contribution is that we take several steps towards closing this gap. In particular, the current analysis shows that m should be of the order of the condition number so that the resulting method has a favorable complexity. Yet in practice m =n works well irregardless of the condition number, where n is the number of data points. Furthermore, the current analysis shows that the inner iterates have to be reset using averaging after every outer loop. Yet in practice SVRG works best when the inner iterates are updated continuously and not reset. We provide an analysis of these aforementioned practical settings and show that they achieve the same favorable complexity as the original analysis (with slightly better constants). Our second contribution is to provide a more general analysis than had been previously done by using arbitrary sampling, which allows us to analyse virtually all forms of minibatching through a single theorem. Since our setup and analysis reflects what is done in practice, we are able to set the parameters such as the minibatch size and step size using our theory in such a way that produces a more efficient algorithm in practice, as we show in extensive numerical experiments.
READ FULL TEXT
Comments
There are no comments yet.