Interpolating Between Gradient Descent and Exponentiated Gradient Using Reparameterized Gradient Descent

02/24/2020
by   Ehsan Amid, et al.
0

Continuous-time mirror descent (CMD) can be seen as the limit case of the discrete-time MD update when the step-size is infinitesimally small. In this paper, we focus on the geometry of the primal and dual CMD updates and introduce a general framework for reparameterizing one CMD update as another. Specifically, the reparameterized update also corresponds to a CMD, but on the composite loss w.r.t. the new variables, and the original variables are obtained via the reparameterization map. We employ these results to introduce a new family of reparameterizations that interpolate between the two commonly used updates, namely the continuous-time gradient descent (GD) and unnormalized exponentiated gradient (EGU), while extending to many other well-known updates. In particular, we show that for the underdetermined linear regression problem, these updates generalize the known behavior of GD and EGU, and provably converge to the minimum L_2-τ-norm solution for τ∈[0,1]. Our new results also have implications for the regularized training of neural networks to induce sparsity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/14/2018

Discrete gradient descent differs qualitatively from gradient flow

We consider gradient descent on functions of the form L_1 = |f| and L_2 ...
research
09/11/2019

An Implicit Form of Krasulina's k-PCA Update without the Orthonormality Constraint

We shed new insights on the two commonly used updates for the online k-P...
research
02/05/2019

Exponentiated Gradient Meets Gradient Descent

The (stochastic) gradient descent and the multiplicative update method a...
research
06/15/2020

The Reflectron: Exploiting geometry for learning generalized linear models

Generalized linear models (GLMs) extend linear regression by generating ...
research
05/22/2018

Step Size Matters in Deep Learning

Training a neural network with the gradient descent algorithm gives rise...
research
06/23/2016

An Approach to Stable Gradient Descent Adaptation of Higher-Order Neural Units

Stability evaluation of a weight-update system of higher-order neural un...
research
01/31/2023

Patch Gradient Descent: Training Neural Networks on Very Large Images

Traditional CNN models are trained and tested on relatively low resoluti...

Please sign up or login with your details

Forgot password? Click here to reset