Characterizing signal propagation to close the performance gap in unnormalized ResNets

by   Andrew Brock, et al.

Batch Normalization is a key component in almost all state-of-the-art image classifiers, but it also introduces practical challenges: it breaks the independence between training examples within a batch, can incur compute and memory overhead, and often results in unexpected bugs. Building on recent theoretical analyses of deep ResNets at initialization, we propose a simple set of analysis tools to characterize signal propagation on the forward pass, and leverage these tools to design highly performant ResNets without activation normalization layers. Crucial to our success is an adapted version of the recently proposed Weight Standardization. Our analysis tools show how this technique preserves the signal in networks with ReLU or Swish activation functions by ensuring that the per-channel activation means do not grow with depth. Across a range of FLOP budgets, our networks attain performance competitive with the state-of-the-art EfficientNets on ImageNet.


page 1

page 2

page 3

page 4


Shifting Mean Activation Towards Zero with Bipolar Activation Functions

We propose a simple extension to the ReLU-family of activation functions...

Farkas layers: don't shift the data, fix the geometry

Successfully training deep neural networks often requires either batch n...

Static Activation Function Normalization

Recent seminal work at the intersection of deep neural networks practice...

Fast Certified Robust Training via Better Initialization and Shorter Warmup

Recently, bound propagation based certified adversarial defense have bee...

An Inertial Newton Algorithm for Deep Learning

We devise a learning algorithm for possibly nonsmooth deep neural networ...

Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers

Training very deep neural networks is still an extremely challenging tas...

Evolving Normalization-Activation Layers

Normalization layers and activation functions are critical components in...