Exponentiated Gradient Meets Gradient Descent

02/05/2019
by   Udaya Ghai, et al.
0

The (stochastic) gradient descent and the multiplicative update method are probably the most popular algorithms in machine learning. We introduce and study a new regularization which provides a unification of the additive and multiplicative updates. This regularization is derived from an hyperbolic analogue of the entropy function, which we call hypentropy. It is motivated by a natural extension of the multiplicative update to negative numbers. The hypentropy has a natural spectral counterpart which we use to derive a family of matrix-based updates that bridge gradient methods and the multiplicative method for matrices. While the latter is only applicable to positive semi-definite matrices, the spectral hypentropy method can naturally be used with general rectangular matrices. We analyze the new family of updates by deriving tight regret bounds. We study empirically the applicability of the new update for settings such as multiclass learning, in which the parameters constitute a general rectangular matrix.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2020

Interpolating Between Gradient Descent and Exponentiated Gradient Using Reparameterized Gradient Descent

Continuous-time mirror descent (CMD) can be seen as the limit case of th...
research
07/06/2023

Multiplicative Updates for Online Convex Optimization over Symmetric Cones

We study online convex optimization where the possible actions are trace...
research
06/16/2015

Spectral Sparsification and Regret Minimization Beyond Matrix Multiplicative Updates

In this paper, we provide a novel construction of the linear-sized spect...
research
09/11/2019

An Implicit Form of Krasulina's k-PCA Update without the Orthonormality Constraint

We shed new insights on the two commonly used updates for the online k-P...
research
02/16/2018

Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks

We analyze algorithms for approximating a function f(x) = Φ x mapping ^d...
research
06/01/2021

A Non-commutative Extension of Lee-Seung's Algorithm for Positive Semidefinite Factorizations

Given a matrix X∈ℝ_+^m× n with nonnegative entries, a Positive Semidefin...
research
04/17/2017

Sparse Communication for Distributed Gradient Descent

We make distributed stochastic gradient descent faster by exchanging spa...

Please sign up or login with your details

Forgot password? Click here to reset