Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches

03/12/2018
by   Yeming Wen, et al.
0

Stochastic neural net weights are used in a variety of contexts, including regularization, Bayesian neural nets, exploration in reinforcement learning, and evolution strategies. Unfortunately, due to the large number of weights, all the examples in a mini-batch typically share the same weight perturbation, thereby limiting the variance reduction effect of large mini-batches. We introduce flipout, an efficient method for decorrelating the gradients within a mini-batch by implicitly sampling pseudo-independent weight perturbations for each example. Empirically, flipout achieves the ideal linear variance reduction for fully connected networks, convolutional networks, and RNNs. We find significant speedups in training neural networks with multiplicative Gaussian perturbations. We show that flipout is effective at regularizing LSTMs, and outperforms previous methods. Flipout also enables us to vectorize evolution strategies: in our experiments, a single GPU with flipout can handle the same throughput as at least 40 CPU cores using existing methods, equivalent to a factor-of-4 cost reduction on Amazon Web Services.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2017

EE-Grad: Exploration and Exploitation for Cost-Efficient Mini-Batch SGD

We present a generic framework for trading off fidelity and cost in comp...
research
05/03/2020

Adaptive Learning of the Optimal Mini-Batch Size of SGD

Recent advances in the theoretical understandingof SGD (Qian et al., 201...
research
06/19/2017

An Empirical Study of Mini-Batch Creation Strategies for Neural Machine Translation

Training of neural machine translation (NMT) models usually uses mini-ba...
research
06/11/2021

Global Neighbor Sampling for Mixed CPU-GPU Training on Giant Graphs

Graph neural networks (GNNs) are powerful tools for learning from graph ...
research
03/09/2020

Amortized variance reduction for doubly stochastic objectives

Approximate inference in complex probabilistic models such as deep Gauss...
research
01/07/2020

Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well

We propose Stochastic Weight Averaging in Parallel (SWAP), an algorithm ...
research
04/21/2023

Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single

We propose an evolution strategies-based algorithm for estimating gradie...

Please sign up or login with your details

Forgot password? Click here to reset