signADAM: Learning Confidences for Deep Neural Networks

07/21/2019
by   Dong Wang, et al.
0

In this paper, we propose a new first-order gradient-based algorithm to train deep neural networks. We first introduce the sign operation of stochastic gradients (as in sign-based methods, e.g., SIGN-SGD) into ADAM, which is called as signADAM. Moreover, in order to make the rate of fitting each feature closer, we define a confidence function to distinguish different components of gradients and apply it to our algorithm. It can generate more sparse gradients than existing algorithms do. We call this new algorithm signADAM++. In particular, both our algorithms are easy to implement and can speed up training of various deep neural networks. The motivation of signADAM++ is preferably learning features from the most different samples by updating large and useful gradients regardless of useless information in stochastic gradients. We also establish theoretical convergence guarantees for our algorithms. Empirical results on various datasets and models show that our algorithms yield much better performance than many state-of-the-art algorithms including SIGN-SGD, SIGNUM and ADAM. We also analyze the performance from multiple perspectives including the loss landscape and develop an adaptive method to further improve generalization. The source code is available at https://github.com/DongWanginxdu/signADAM-Learn-by-Confidence.

READ FULL TEXT
research
07/09/2021

Activated Gradients for Deep Neural Networks

Deep neural networks often suffer from poor performance or even training...
research
12/23/2014

ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient

Stochastic gradient algorithms have been the main focus of large-scale l...
research
08/18/2019

Towards Better Generalization: BP-SVRG in Training Deep Neural Networks

Stochastic variance-reduced gradient (SVRG) is a classical optimization ...
research
06/11/2020

AdaS: Adaptive Scheduling of Stochastic Gradients

The choice of step-size used in Stochastic Gradient Descent (SGD) optimi...
research
11/28/2020

GradAug: A New Regularization Method for Deep Neural Networks

We propose a new regularization method to alleviate over-fitting in deep...
research
03/01/2021

Fast threshold optimization for multi-label audio tagging using Surrogate gradient learning

Multi-label audio tagging consists of assigning sets of tags to audio re...
research
10/25/2021

Fast Gradient Non-sign Methods

Adversarial attacks make their success in fooling DNNs and among them, g...

Please sign up or login with your details

Forgot password? Click here to reset