Interpolating Between Gradient Descent and Exponentiated Gradient Using Reparameterized Gradient Descent
Continuous-time mirror descent (CMD) can be seen as the limit case of the discrete-time MD update when the step-size is infinitesimally small. In this paper, we focus on the geometry of the primal and dual CMD updates and introduce a general framework for reparameterizing one CMD update as another. Specifically, the reparameterized update also corresponds to a CMD, but on the composite loss w.r.t. the new variables, and the original variables are obtained via the reparameterization map. We employ these results to introduce a new family of reparameterizations that interpolate between the two commonly used updates, namely the continuous-time gradient descent (GD) and unnormalized exponentiated gradient (EGU), while extending to many other well-known updates. In particular, we show that for the underdetermined linear regression problem, these updates generalize the known behavior of GD and EGU, and provably converge to the minimum L_2-τ-norm solution for τ∈[0,1]. Our new results also have implications for the regularized training of neural networks to induce sparsity.
READ FULL TEXT