Gradient Descent Averaging and Primal-dual Averaging for Strongly Convex Optimization

12/29/2020
by   Wei Tao, et al.
0

Averaging scheme has attracted extensive attention in deep learning as well as traditional machine learning. It achieves theoretically optimal convergence and also improves the empirical model performance. However, there is still a lack of sufficient convergence analysis for strongly convex optimization. Typically, the convergence about the last iterate of gradient descent methods, which is referred to as individual convergence, fails to attain its optimality due to the existence of logarithmic factor. In order to remove this factor, we first develop gradient descent averaging (GDA), which is a general projection-based dual averaging algorithm in the strongly convex setting. We further present primal-dual averaging for strongly convex cases (SC-PDA), where primal and dual averaging schemes are simultaneously utilized. We prove that GDA yields the optimal convergence rate in terms of output averaging, while SC-PDA derives the optimal individual convergence. Several experiments on SVMs and deep learning models validate the correctness of theoretical analysis and effectiveness of algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/02/2019

Simple and optimal high-probability bounds for strongly-convex stochastic gradient descent

We consider stochastic gradient descent algorithms for minimizing a non-...
research
05/27/2023

Some Primal-Dual Theory for Subgradient Methods for Strongly Convex Optimization

We consider (stochastic) subgradient methods for strongly convex but pot...
research
10/30/2019

Unifying mirror descent and dual averaging

We introduce and analyse a new family of algorithms which generalizes an...
research
03/26/2019

First-Order Methods with Increasing Iterate Averaging for Solving Saddle-Point Problems

First-order methods are known to be among the fastest algorithms for sol...
research
02/26/2021

Variance Reduction via Primal-Dual Accelerated Dual Averaging for Nonsmooth Convex Finite-Sums

We study structured nonsmooth convex finite-sum optimization that appear...
research
03/09/2020

Revisiting SGD with Increasingly Weighted Averaging: Optimization and Generalization Perspectives

Stochastic gradient descent (SGD) has been widely studied in the literat...
research
06/08/2020

The Strength of Nesterov's Extrapolation in the Individual Convergence of Nonsmooth Optimization

The extrapolation strategy raised by Nesterov, which can accelerate the ...

Please sign up or login with your details

Forgot password? Click here to reset