DeepAI AI Chat
Log In Sign Up

Empirically explaining SGD from a line search perspective

by   Maximus Mutschler, et al.

Optimization in Deep Learning is mainly guided by vague intuitions and strong assumptions, with a limited understanding how and why these work in practice. To shed more light on this, our work provides some deeper understandings of how SGD behaves by empirically analyzing the trajectory taken by SGD from a line search perspective. Specifically, a costly quantitative analysis of the full-batch loss along SGD trajectories from common used models trained on a subset of CIFAR-10 is performed. Our core results include that the full-batch loss along lines in update step direction is highly parabolically. Further on, we show that there exists a learning rate with which SGD always performs almost exact line searches on the full-batch loss. Finally, we provide a different perspective why increasing the batch size has almost the same effect as decreasing the learning rate by the same factor.


page 5

page 13

page 14

page 17


Using a one dimensional parabolic model of the full-batch loss to estimate learning rates during training

A fundamental challenge in Deep Learning is to find optimal step sizes f...

Three Factors Influencing Minima in SGD

We study the properties of the endpoint of stochastic gradient descent (...

Curvature is Key: Sub-Sampled Loss Surfaces and the Implications for Large Batch Training

We study the effect of mini-batching on the loss landscape of deep neura...

Direction Matters: On the Implicit Regularization Effect of Stochastic Gradient Descent with Moderate Learning Rate

Understanding the algorithmic regularization effect of stochastic gradie...

Learning Stages: Phenomenon, Root Cause, Mechanism Hypothesis, and Implications

Under StepDecay learning rate strategy (decaying the learning rate after...

Black-Box Generalization

We provide the first generalization error analysis for black-box learnin...

The Break-Even Point on Optimization Trajectories of Deep Neural Networks

The early phase of training of deep neural networks is critical for thei...