ICLR Reproducibility Challenge Report (Padam : Closing The Generalization Gap Of Adaptive Gradient Methods in Training Deep Neural Networks)

01/28/2019
by   Harshal Mittal, et al.
0

This work is a part of ICLR Reproducibility Challenge 2019, we try to reproduce the results in the conference submission PADAM: Closing The Generalization Gap of Adaptive Gradient Methods In Training Deep Neural Networks. Adaptive gradient methods proposed in past demonstrate a degraded generalization performance than the stochastic gradient descent (SGD) with momentum. The authors try to address this problem by designing a new optimization algorithm that bridges the gap between the space of Adaptive Gradient algorithms and SGD with momentum. With this method a new tunable hyperparameter called partially adaptive parameter p is introduced that varies between [0, 0.5]. We build the proposed optimizer and use it to mirror the experiments performed by the authors. We review and comment on the empirical analysis performed by the authors. Finally, we also propose a future direction for further study of Padam. Our code is available at: https://github.com/yashkant/Padam-Tensorflow

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2018

Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks

Adaptive gradient methods, which adopt historical gradient information t...
research
08/02/2019

Calibrating the Learning Rate for Adaptive Gradient Methods to Improve Generalization Performance

Although adaptive gradient methods (AGMs) have fast speed in training de...
research
03/26/2021

Exploiting Adam-like Optimization Algorithms to Improve the Performance of Convolutional Neural Networks

Stochastic gradient descent (SGD) is the main approach for training deep...
research
03/28/2019

PAL: A fast DNN optimization method based on curvature information

We present a novel optimizer for deep neural networks that combines the ...
research
03/14/2020

Investigating Generalization in Neural Networks under Optimally Evolved Training Perturbations

In this paper, we study the generalization properties of neural networks...
research
11/15/2022

Empirical Study on Optimizer Selection for Out-of-Distribution Generalization

Modern deep learning systems are fragile and do not generalize well unde...
research
07/19/2022

Moment Centralization based Gradient Descent Optimizers for Convolutional Neural Networks

Convolutional neural networks (CNNs) have shown very appealing performan...

Please sign up or login with your details

Forgot password? Click here to reset