Resonance in Weight Space: Covariate Shift Can Drive Divergence of SGD with Momentum

03/22/2022
by   Kirby Banman, et al.
4

Most convergence guarantees for stochastic gradient descent with momentum (SGDm) rely on iid sampling. Yet, SGDm is often used outside this regime, in settings with temporally correlated input samples such as continual learning and reinforcement learning. Existing work has shown that SGDm with a decaying step-size can converge under Markovian temporal correlation. In this work, we show that SGDm under covariate shift with a fixed step-size can be unstable and diverge. In particular, we show SGDm under covariate shift is a parametric oscillator, and so can suffer from a phenomenon known as resonance. We approximate the learning system as a time varying system of ordinary differential equations, and leverage existing theory to characterize the system's divergence/convergence as resonant/nonresonant modes. The theoretical result is limited to the linear setting with periodic covariate shift, so we empirically supplement this result to show that resonance phenomena persist even under non-periodic covariate shift, nonlinear dynamics with neural networks, and optimizers other than SGDm.

READ FULL TEXT

page 6

page 15

page 17

page 18

research
06/19/2020

On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems

This paper analyzes the trajectories of stochastic gradient descent (SGD...
research
12/02/2021

On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective

We study the statistical properties of the dynamic trajectory of stochas...
research
03/14/2016

On the Influence of Momentum Acceleration on Online Learning

The article examines in some detail the convergence rate and mean-square...
research
01/28/2022

Differential Privacy Guarantees for Stochastic Gradient Langevin Dynamics

We analyse the privacy leakage of noisy stochastic gradient descent by m...
research
02/15/2021

Momentum Residual Neural Networks

The training of deep residual neural networks (ResNets) with backpropaga...
research
06/22/2021

Dangers of Bayesian Model Averaging under Covariate Shift

Approximate Bayesian inference for neural networks is considered a robus...
research
01/31/2019

Improving SGD convergence by tracing multiple promising directions and estimating distance to minimum

Deep neural networks are usually trained with stochastic gradient descen...

Please sign up or login with your details

Forgot password? Click here to reset