Deep networks with probabilistic gates

12/11/2018
by   Charles Herrmann, et al.
10

We investigate learning to probabilistically bypass computations in a network architecture. Our approach is motivated by AIG, where layers are conditionally executed depending on their inputs, and the network is trained against a target bypass rate using a per-layer loss. We propose a per-batch loss function, and describe strategies for handling probabilistic bypass during inference as well as training. Per-batch loss allows the network additional flexibility. In particular, a form of mode collapse becomes plausible, where some layers are nearly always bypassed and some almost never; such a configuration is strongly discouraged by AIG's per-layer loss. We explore several inference-time strategies, including the natural MAP approach. With data-dependent bypass, we demonstrate improved performance over AIG. With data-independent bypass, as in stochastic depth, we observe mode collapse and effectively prune layers. We demonstrate our techniques on ResNet-50 and ResNet-101 for ImageNet , where our techniques produce improved accuracy (.15--.41 substantially less computation (bypassing 25--40

READ FULL TEXT

page 10

page 12

page 30

page 31

page 33

research
07/05/2020

Alpha-Net: Architecture, Models, and Applications

Deep learning network training is usually computationally expensive and ...
research
11/17/2016

DelugeNets: Deep Networks with Efficient and Flexible Cross-layer Information Inflows

Deluge Networks (DelugeNets) are deep neural networks which efficiently ...
research
01/05/2023

Training trajectories, mini-batch losses and the curious role of the learning rate

Stochastic gradient descent plays a fundamental role in nearly all appli...
research
09/06/2017

BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks

Deep neural networks are state of the art methods for many learning task...
research
04/05/2018

Learning Strict Identity Mappings in Deep Residual Networks

A family of super deep networks, referred to as residual networks or Res...
research
07/23/2022

Boosting the Efficiency of Parametric Detection with Hierarchical Neural Networks

Gravitational wave astronomy is a vibrant field that leverages both clas...
research
02/19/2021

Training cascaded networks for speeded decisions using a temporal-difference loss

Although deep feedforward neural networks share some characteristics wit...

Please sign up or login with your details

Forgot password? Click here to reset