DeepAI AI Chat
Log In Sign Up

Not All Samples Are Created Equal: Deep Learning with Importance Sampling

by   Angelos Katharopoulos, et al.

Deep neural network training spends most of the computation on examples that are properly handled, and could be ignored. We propose to mitigate this phenomenon with a principled importance sampling scheme that focuses computation on "informative" examples, and reduces the variance of the stochastic gradients during training. Our contribution is twofold: first, we derive a tractable upper bound to the per-sample gradient norm, and second we derive an estimator of the variance reduction achieved with importance sampling, which enables us to switch it on when it will result in an actual speedup. The resulting scheme can be used by changing a few lines of code in a standard SGD procedure, and we demonstrate experimentally, on image classification, CNN fine-tuning, and RNN training, that for a fixed wall-clock time budget, it provides a reduction of the train losses of up to an order of magnitude and a relative improvement of test errors between 5


page 1

page 2

page 3

page 4


Variance Reduction in SGD by Distributed Importance Sampling

Humans are able to accelerate their learning by selecting training mater...

Importance Sampling for Minibatches

Minibatching is a very well studied and highly popular technique in supe...

Finite-sample Guarantees for Winsorized Importance Sampling

Importance sampling is a widely used technique to estimate the propertie...

Learning Optimal Flows for Non-Equilibrium Importance Sampling

Many applications in computational sciences and statistical inference re...

How Important is Importance Sampling for Deep Budgeted Training?

Long iterative training processes for Deep Neural Networks (DNNs) are co...

Policy Improvement for POMDPs Using Normalized Importance Sampling

We present a new method for estimating the expected return of a POMDP fr...

A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme

Reparameterization (RP) and likelihood ratio (LR) gradient estimators ar...

Code Repositories


Code for experiments regarding importance sampling for training neural networks

view repo