The Strength of Nesterov's Extrapolation in the Individual Convergence of Nonsmooth Optimization

06/08/2020
by   W. Tao, et al.
0

The extrapolation strategy raised by Nesterov, which can accelerate the convergence rate of gradient descent methods by orders of magnitude when dealing with smooth convex objective, has led to tremendous success in training machine learning tasks. In this article, the convergence of individual iterates of projected subgradient (PSG) methods for nonsmooth convex optimization problems is theoretically studied based on Nesterov's extrapolation, which we name individual convergence. We prove that Nesterov's extrapolation has the strength to make the individual convergence of PSG optimal for nonsmooth problems. In light of this consideration, a direct modification of the subgradient evaluation suffices to achieve optimal individual convergence for strongly convex problems, which can be regarded as making an interesting step toward the open question about stochastic gradient descent (SGD) posed by Shamir. Furthermore, we give an extension of the derived algorithms to solve regularized learning tasks with nonsmooth losses in stochastic settings. Compared with other state-of-the-art nonsmooth methods, the derived algorithms can serve as an alternative to the basic SGD especially in coping with machine learning problems, where an individual output is needed to guarantee the regularization structure while keeping an optimal rate of convergence. Typically, our method is applicable as an efficient tool for solving large-scale l1-regularized hinge-loss learning problems. Several comparison experiments demonstrate that our individual output not only achieves an optimal convergence rate but also guarantees better sparsity than the averaged solution.

READ FULL TEXT

page 1

page 12

research
08/09/2015

A Linearly-Convergent Stochastic L-BFGS Algorithm

We propose a new stochastic L-BFGS algorithm and prove a linear converge...
research
03/22/2022

Provable Constrained Stochastic Convex Optimization with XOR-Projected Gradient Descent

Provably solving stochastic convex optimization problems with constraint...
research
09/29/2017

Convergence Analysis of Distributed Stochastic Gradient Descent with Shuffling

When using stochastic gradient descent to solve large-scale machine lear...
research
12/29/2020

Gradient Descent Averaging and Primal-dual Averaging for Strongly Convex Optimization

Averaging scheme has attracted extensive attention in deep learning as w...
research
04/19/2013

Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections

We consider stochastic strongly convex optimization with a complex inequ...
research
08/15/2020

Obtaining Adjustable Regularization for Free via Iterate Averaging

Regularization for optimization is a crucial technique to avoid overfitt...
research
04/16/2019

On Structured Filtering-Clustering: Global Error Bound and Optimal First-Order Algorithms

In recent years, the filtering-clustering problems have been a central t...

Please sign up or login with your details

Forgot password? Click here to reset