DeepAI AI Chat
Log In Sign Up

Batch Size Matters: A Diffusion Approximation Framework on Nonconvex Stochastic Gradient Descent

05/22/2017
by   Chris Junchi Li, et al.
0

In this paper, we study the stochastic gradient descent method in analyzing nonconvex statistical optimization problems from a diffusion approximation point of view. Using the theory of large deviation of random dynamical system, we prove in the small stepsize regime and the presence of omnidirectional noise the following: starting from a local minimizer (resp. saddle point) the SGD iteration escapes in a number of iteration that is exponentially (resp. linearly) dependent on the inverse stepsize. We take the deep neural network as an example to study this phenomenon. Based on a new analysis of the mixing rate of multidimensional Ornstein-Uhlenbeck processes, our theory substantiate a very recent empirical results by keskar2016large, suggesting that large batch sizes in training deep learning for synchronous optimization leads to poor generalization error.

READ FULL TEXT

page 1

page 2

page 3

page 4

04/26/2017

Linear Convergence of Accelerated Stochastic Gradient Descent for Nonconvex Nonsmooth Optimization

In this paper, we study the stochastic gradient descent (SGD) method for...
08/29/2018

Online ICA: Understanding Global Dynamics of Nonconvex Optimization via Diffusion Processes

Solving statistical learning problems often involves nonconvex optimizat...
02/09/2016

Poor starting points in machine learning

Poor (even random) starting points for learning/training/optimization ar...
02/14/2018

Toward Deeper Understanding of Nonconvex Stochastic Optimization with Momentum using Diffusion Approximations

Momentum Stochastic Gradient Descent (MSGD) algorithm has been widely ap...
01/22/2020

Stochastic Item Descent Method for Large Scale Equal Circle Packing Problem

Stochastic gradient descent (SGD) is a powerful method for large-scale o...
01/16/2023

Stability Analysis of Sharpness-Aware Minimization

Sharpness-aware minimization (SAM) is a recently proposed training metho...