The Effect of Network Width on the Performance of Large-batch Training

06/11/2018
by   Lingjiao Chen, et al.
0

Distributed implementations of mini-batch stochastic gradient descent (SGD) suffer from communication overheads, attributed to the high frequency of gradient updates inherent in small-batch training. Training with large batches can reduce these overheads; however, large batches can affect the convergence properties and generalization performance of SGD. In this work, we take a first step towards analyzing how the structure (width and depth) of a neural network affects the performance of large-batch training. We present new theoretical results which suggest that--for a fixed number of parameters--wider networks are more amenable to fast large-batch training compared to deeper ones. We provide extensive experiments on residual and fully-connected neural networks which suggest that wider networks can be trained using larger batches without incurring a convergence slow-down, unlike their deeper variants.

READ FULL TEXT

page 7

page 8

page 9

research
04/15/2019

The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent

The goal of this paper is to study why stochastic gradient descent (SGD)...
research
03/11/2019

Accelerating Minibatch Stochastic Gradient Descent using Typicality Sampling

Machine learning, especially deep neural networks, has been rapidly deve...
research
10/18/2019

Improving the convergence of SGD through adaptive batch sizes

Mini-batch stochastic gradient descent (SGD) approximates the gradient o...
research
12/20/2021

The effective noise of Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) is the workhorse algorithm of deep lea...
research
02/26/2020

Stagewise Enlargement of Batch Size for SGD-based Learning

Existing research shows that the batch size can seriously affect the per...
research
01/27/2019

Augment your batch: better training with larger batches

Large-batch SGD is important for scaling training of deep neural network...
research
06/07/2021

Batch Normalization Orthogonalizes Representations in Deep Random Networks

This paper underlines a subtle property of batch-normalization (BN): Suc...

Please sign up or login with your details

Forgot password? Click here to reset