Online Batch Selection for Faster Training of Neural Networks

11/19/2015
by   Ilya Loshchilov, et al.
0

Deep neural networks are commonly trained using stochastic non-convex optimization procedures, which are driven by gradient information estimated on fractions (batches) of the dataset. While it is commonly accepted that batch size is an important parameter for offline tuning, the benefits of online selection of batches remain poorly understood. We investigate online batch selection strategies for two state-of-the-art methods of stochastic gradient-based optimization, AdaDelta and Adam. As the loss function to be minimized for the whole dataset is an aggregation of loss functions of individual datapoints, intuitively, datapoints with the greatest loss should be considered (selected in a batch) more frequently. However, the limitations of this intuition and the proper control of the selection pressure over time are open questions. We propose a simple strategy where all datapoints are ranked w.r.t. their latest known loss value and the probability to be selected decays exponentially as a function of rank. Our experimental results on the MNIST dataset suggest that selecting batches speeds up both AdaDelta and Adam by a factor of about 5.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2021

Stochastic Online Convex Optimization; Application to probabilistic time series forecasting

Stochastic regret bounds for online algorithms are usually derived from ...
research
06/20/2019

Submodular Batch Selection for Training Deep Neural Networks

Mini-batch gradient descent based methods are the de facto algorithms fo...
research
06/25/2020

Taming neural networks with TUSLA: Non-convex learning via adaptive stochastic gradient Langevin algorithms

Artificial neural networks (ANNs) are typically highly nonlinear systems...
research
06/08/2023

A Gradient-based Approach for Online Robust Deep Neural Network Training with Noisy Labels

Learning with noisy labels is an important topic for scalable training i...
research
04/18/2020

Accumulator Bet Selection Through Stochastic Diffusion Search

An accumulator is a bet that presents a rather unique payout structure, ...
research
06/15/2017

Stochastic Training of Neural Networks via Successive Convex Approximations

This paper proposes a new family of algorithms for training neural netwo...
research
10/26/2022

Adaptive Model Learning of Neural Networks with UUB Stability for Robot Dynamic Estimation

Since batch algorithms suffer from lack of proficiency in confronting mo...

Please sign up or login with your details

Forgot password? Click here to reset