Parallel Dither and Dropout for Regularising Deep Neural Networks

08/28/2015
by   Andrew J. R. Simpson, et al.
0

Effective regularisation during training can mean the difference between success and failure for deep neural networks. Recently, dither has been suggested as alternative to dropout for regularisation during batch-averaged stochastic gradient descent (SGD). In this article, we show that these methods fail without batch averaging and we introduce a new, parallel regularisation method that may be used without batch averaging. Our results for parallel-regularised non-batch-SGD are substantially better than what is possible with batch-SGD. Furthermore, our results demonstrate that dither and dropout are complimentary.

READ FULL TEXT
research
03/11/2019

Accelerating Minibatch Stochastic Gradient Descent using Typicality Sampling

Machine learning, especially deep neural networks, has been rapidly deve...
research
01/27/2019

Augment your batch: better training with larger batches

Large-batch SGD is important for scaling training of deep neural network...
research
04/24/2017

Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples

Self-paced learning and hard example mining re-weight training instances...
research
10/04/2019

Distributed Learning of Deep Neural Networks using Independent Subnet Training

Stochastic gradient descent (SGD) is the method of choice for distribute...
research
12/12/2017

Integrated Model, Batch and Domain Parallelism in Training Neural Networks

We propose a new integrated method of exploiting model, batch and domain...
research
01/22/2020

Stochastic Item Descent Method for Large Scale Equal Circle Packing Problem

Stochastic gradient descent (SGD) is a powerful method for large-scale o...
research
12/24/2020

On Batch Normalisation for Approximate Bayesian Inference

We study batch normalisation in the context of variational inference met...

Please sign up or login with your details

Forgot password? Click here to reset