
Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent
Stochastic gradient descent (SGD) with constant momentum and its variant...
read it

Momentum in Reinforcement Learning
We adapt the optimization's concept of momentum to reinforcement learnin...
read it

Robust Sampling in Deep Learning
Deep learning requires regularization mechanisms to reduce overfitting a...
read it

PositiveNegative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization
It is wellknown that stochastic gradient noise (SGN) acts as implicit r...
read it

Quasihyperbolic momentum and Adam for deep learning
Momentumbased acceleration of stochastic gradient descent (SGD) is wide...
read it

Asynchrony begets Momentum, with an Application to Deep Learning
Asynchronous methods are widely used in deep learning, but have limited ...
read it

SGD with Hardness Weighted Sampling for Distributionally Robust Deep Learning
Distributionally Robust Optimization (DRO) has been proposed as an alter...
read it
TimeDelay Momentum: A Regularization Perspective on the Convergence and Generalization of Stochastic Momentum for Deep Learning
In this paper we study the problem of convergence and generalization error bound of stochastic momentum for deep learning from the perspective of regularization. To do so, we first interpret momentum as solving an ℓ_2regularized minimization problem to learn the offsets between arbitrary two successive model parameters. We call this timedelay momentum because the model parameter is updated after a few iterations towards finding the minimizer. We then propose our learning algorithm, stochastic gradient descent (SGD) with timedelay momentum. We show that our algorithm can be interpreted as solving a sequence of strongly convex optimization problems using SGD. We prove that under mild conditions our algorithm can converge to a stationary point with rate of O(1/√(K)) and generalization error bound of O(1/√(nδ)) with probability at least 1δ, where K,n are the numbers of model updates and training samples, respectively. We demonstrate the empirical superiority of our algorithm in deep learning in comparison with the stateoftheart deep learning solvers.
READ FULL TEXT
Comments
There are no comments yet.