Generalized AdaGrad (G-AdaGrad) and Adam: A State-Space Perspective

05/31/2021
by   Kushal Chakrabarti, et al.
0

Accelerated gradient-based methods are being extensively used for solving non-convex machine learning problems, especially when the data points are abundant or the available data is distributed across several agents. Two of the prominent accelerated gradient algorithms are AdaGrad and Adam. AdaGrad is the simplest accelerated gradient method, which is particularly effective for sparse data. Adam has been shown to perform favorably in deep learning problems compared to other methods. In this paper, we propose a new fast optimizer, Generalized AdaGrad (G-AdaGrad), for accelerating the solution of potentially non-convex machine learning problems. Specifically, we adopt a state-space perspective for analyzing the convergence of gradient acceleration algorithms, namely G-AdaGrad and Adam, in machine learning. Our proposed state-space models are governed by ordinary differential equations. We present simple convergence proofs of these two algorithms in the deterministic settings with minimal assumptions. Our analysis also provides intuition behind improving upon AdaGrad's convergence rate. We provide empirical results on MNIST dataset to reinforce our claims on the convergence and performance of G-AdaGrad and Adam.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2018

Run Procrustes, Run! On the convergence of accelerated Procrustes Flow

In this work, we present theoretical results on the convergence of non-c...
research
10/11/2019

Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning

We investigate the theoretical limits of pipeline parallel learning of d...
research
02/21/2020

Asynchronous parallel adaptive stochastic gradient methods

Stochastic gradient methods (SGMs) are the predominant approaches to tra...
research
01/30/2023

Reweighted Interacting Langevin Diffusions: an Accelerated Sampling Methodfor Optimization

We proposed a new technique to accelerate sampling methods for solving d...
research
06/05/2019

Data Sketching for Faster Training of Machine Learning Models

Many machine learning problems reduce to the problem of minimizing an ex...
research
09/09/2023

A Gentle Introduction to Gradient-Based Optimization and Variational Inequalities for Machine Learning

The rapid progress in machine learning in recent years has been based on...
research
09/19/2022

BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach

Bilevel optimization (BO) is useful for solving a variety of important m...

Please sign up or login with your details

Forgot password? Click here to reset