ANITA: An Optimal Loopless Accelerated Variance-Reduced Gradient Method

by   Zhize Li, et al.

We propose a novel accelerated variance-reduced gradient method called ANITA for finite-sum optimization. In this paper, we consider both general convex and strongly convex settings. In the general convex setting, ANITA achieves the convergence result O(nmin{1+log1/ϵ√(n), log√(n)} + √(nL/ϵ)), which improves the previous best result O(nmin{log1/ϵ, log n}+√(nL/ϵ)) given by Varag (Lan et al., 2019). In particular, for a very wide range of ϵ, i.e., ϵ∈ (0,L/nlog^2√(n)]∪ [1/√(n),+∞), where ϵ is the error tolerance f(x_T)-f^*≤ϵ and n is the number of data samples, ANITA can achieve the optimal convergence result O(n+√(nL/ϵ)) matching the lower bound Ω(n+√(nL/ϵ)) provided by Woodworth and Srebro (2016). To the best of our knowledge, ANITA is the first accelerated algorithm which can exactly achieve this optimal result O(n+√(nL/ϵ)) for general convex finite-sum problems. In the strongly convex setting, we also show that ANITA can achieve the optimal convergence result O((n+√(nL/μ))log1/ϵ) matching the lower bound Ω((n+√(nL/μ))log1/ϵ) provided by Lan and Zhou (2015). Moreover, ANITA enjoys a simpler loopless algorithmic structure unlike previous accelerated algorithms such as Katyusha (Allen-Zhu, 2017) and Varag (Lan et al., 2019) where they use an inconvenient double-loop structure. Finally, the experimental results also show that ANITA converges faster than previous state-of-the-art Varag (Lan et al., 2019), validating our theoretical results and confirming the practical superiority of ANITA.


page 1

page 2

page 3

page 4


Optimal Finite-Sum Smooth Non-Convex Optimization with SARAH

The total complexity (measured as the total number of gradient computati...

Stochastic Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization

In this paper, we introduce a simplified and unified method for finite-s...

An Accelerated DFO Algorithm for Finite-sum Convex Functions

Derivative-free optimization (DFO) has recently gained a lot of momentum...

Accelerated SGD for Non-Strongly-Convex Least Squares

We consider stochastic approximation for the least squares regression pr...

ZeroSARAH: Efficient Nonconvex Finite-Sum Optimization with Zero Full Gradient Computation

We propose ZeroSARAH – a novel variant of the variance-reduced method SA...

A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization

In modern large-scale machine learning applications, the training data a...

Fiducial Matching for the Approximate Posterior: F-ABC

F-ABC is introduced, using universal sufficient statistics, unlike previ...