Blockwise Adaptivity: Faster Training and Better Generalization in Deep Learning

05/23/2019
by   Shuai Zheng, et al.
0

Stochastic methods with coordinate-wise adaptive stepsize (such as RMSprop and Adam) have been widely used in training deep neural networks. Despite their fast convergence, they can generalize worse than stochastic gradient descent. In this paper, by revisiting the design of Adagrad, we propose to split the network parameters into blocks, and use a blockwise adaptive stepsize. Intuitively, blockwise adaptivity is less aggressive than adaptivity to individual coordinates, and can have a better balance between adaptivity and generalization. We show theoretically that the proposed blockwise adaptive gradient descent has comparable convergence rate as its counterpart with coordinate-wise adaptive stepsize, but is faster up to some constant. We also study its uniform stability and show that blockwise adaptivity can lead to lower generalization error than coordinate-wise adaptivity. Experimental results show that blockwise adaptive gradient descent converges faster and improves generalization performance over Nesterov's accelerated gradient and Adam.

READ FULL TEXT
research
06/18/2018

Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks

Adaptive gradient methods, which adopt historical gradient information t...
research
05/31/2023

Toward Understanding Why Adam Converges Faster Than SGD for Transformers

While stochastic gradient descent (SGD) is still the most popular optimi...
research
06/01/2015

Coordinate Descent Converges Faster with the Gauss-Southwell Rule Than Random Selection

There has been significant recent work on the theory and application of ...
research
03/17/2018

Constrained Deep Learning using Conditional Gradient and Applications in Computer Vision

A number of results have recently demonstrated the benefits of incorpora...
research
08/23/2023

Layer-wise Feedback Propagation

In this paper, we present Layer-wise Feedback Propagation (LFP), a novel...
research
10/17/2021

Explaining generalization in deep learning: progress and fundamental limits

This dissertation studies a fundamental open challenge in deep learning ...
research
08/16/2018

Experiential Robot Learning with Accelerated Neuroevolution

Derivative-based optimization techniques such as Stochastic Gradient Des...

Please sign up or login with your details

Forgot password? Click here to reset