CompAdaGrad: A Compressed, Complementary, Computationally-Efficient Adaptive Gradient Method

09/12/2016
by   Nishant A. Mehta, et al.
0

The adaptive gradient online learning method known as AdaGrad has seen widespread use in the machine learning community in stochastic and adversarial online learning problems and more recently in deep learning methods. The method's full-matrix incarnation offers much better theoretical guarantees and potentially better empirical performance than its diagonal version; however, this version is computationally prohibitive and so the simpler diagonal version often is used in practice. We introduce a new method, CompAdaGrad, that navigates the space between these two schemes and show that this method can yield results much better than diagonal AdaGrad while avoiding the (effectively intractable) O(n^3) computational complexity of full-matrix AdaGrad for dimension n. CompAdaGrad essentially performs full-matrix regularization in a low-dimensional subspace while performing diagonal regularization in the complementary subspace. We derive CompAdaGrad's updates for composite mirror descent in case of the squared ℓ_2 norm and the ℓ_1 norm, demonstrate that its complexity per iteration is linear in the dimension, and establish guarantees for the method independent of the choice of composite regularizer. Finally, we show preliminary results on several datasets.

READ FULL TEXT
research
06/08/2018

The Case for Full-Matrix Adaptive Regularization

Adaptive regularization methods come in diagonal and full-matrix variant...
research
10/15/2019

Variable Metric Proximal Gradient Method with Diagonal Barzilai-Borwein Stepsize

Variable metric proximal gradient (VM-PG) is a widely used class of conv...
research
09/02/2022

Optimal Diagonal Preconditioning: Theory and Practice

Preconditioning has been a staple technique in optimization and machine ...
research
05/26/2019

Stochastic Gradient Methods with Block Diagonal Matrix Adaptation

Adaptive gradient approaches that automatically adjust the learning rate...
research
11/21/2016

Scalable Adaptive Stochastic Optimization Using Random Projections

Adaptive stochastic gradient methods such as AdaGrad have gained popular...
research
11/07/2017

FADO: A Deterministic Detection/Learning Algorithm

This paper proposes and studies a detection technique for adversarial sc...

Please sign up or login with your details

Forgot password? Click here to reset