On the Generalization of Stochastic Gradient Descent with Momentum

02/26/2021
by   Ali Ramezani-Kebrya, et al.
0

While momentum-based methods, in conjunction with stochastic gradient descent (SGD), are widely used when training machine learning models, there is little theoretical understanding on the generalization error of such methods. In this work, we first show that there exists a convex loss function for which algorithmic stability fails to establish generalization guarantees when SGD with standard heavy-ball momentum (SGDM) is run for multiple epochs. Then, for smooth Lipschitz loss functions, we analyze a modified momentum-based update rule, i.e., SGD with early momentum (SGDEM), and show that it admits an upper-bound on the generalization error. Thus, our results show that machine learning models can be trained for multiple epochs of SGDEM with a guarantee for generalization. Finally, for the special case of strongly convex loss functions, we find a range of momentum such that multiple epochs of standard SGDM, as a special form of SGDEM, also generalizes. Extending our results on generalization, we also develop an upper-bound on the expected true risk, in terms of the number of training steps, the size of the training set, and the momentum parameter. Experimental evaluations verify the consistency between the numerical results and our theoretical bounds and the effectiveness of SGDEM for smooth Lipschitz loss functions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2018

On the Stability and Convergence of Stochastic Gradient Descent with Momentum

While momentum-based methods, in conjunction with the stochastic gradien...
research
02/10/2021

Stability of SGD: Tightness Analysis and Improved Bounds

Stochastic Gradient Descent (SGD) based methods have been widely used fo...
research
06/11/2019

ADASS: Adaptive Sample Selection for Training Acceleration

Stochastic gradient decent (SGD) and its variants, including some accele...
research
02/14/2020

Statistical Learning with Conditional Value at Risk

We propose a risk-averse statistical learning framework wherein the perf...
research
10/24/2019

Diametrical Risk Minimization: Theory and Computations

The theoretical and empirical performance of Empirical Risk Minimization...
research
05/30/2019

Exploiting Uncertainty of Loss Landscape for Stochastic Optimization

We introduce novel variants of momentum by incorporating the variance of...
research
05/01/2021

RATT: Leveraging Unlabeled Data to Guarantee Generalization

To assess generalization, machine learning scientists typically either (...

Please sign up or login with your details

Forgot password? Click here to reset