How Does BN Increase Collapsed Neural Network Filters?

01/30/2020
by   Sheng Zhou, et al.
32

Improving sparsity of deep neural networks (DNNs) is essential for network compression and has drawn much attention. In this work, we disclose a harmful sparsifying process called filter collapse, which is common in DNNs with batch normalization (BN) and rectified linear activation functions (e.g. ReLU, Leaky ReLU). It occurs even without explicit sparsity-inducing regularizations such as L_1. This phenomenon is caused by the normalization effect of BN, which induces a non-trainable region in the parameter space and reduces the network capacity as a result. This phenomenon becomes more prominent when the network is trained with large learning rates (LR) or adaptive LR schedulers, and when the network is finetuned. We analytically prove that the parameters of BN tend to become sparser during SGD updates with high gradient noise and that the sparsifying probability is proportional to the square of learning rate and inversely proportional to the square of the scale parameter of BN. To prevent the undesirable collapsed filters, we propose a simple yet effective approach named post-shifted BN (psBN), which has the same representation ability as BN while being able to automatically make BN parameters trainable again as they saturate during training. With psBN, we can recover collapsed filters and increase the model performance in various tasks such as classification on CIFAR-10 and object detection on MS-COCO2017.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/23/2023

The Disharmony Between BN and ReLU Causes Gradient Explosion, but is Offset by the Correlation Between Activations

Deep neural networks based on batch normalization and ReLU-like activati...
research
01/15/2020

Filter Grafting for Deep Neural Networks

This paper proposes a new learning paradigm called filter grafting, whic...
research
07/01/2021

On the Expected Complexity of Maxout Networks

Learning with neural networks relies on the complexity of the representa...
research
07/09/2019

Mean Spectral Normalization of Deep Neural Networks for Embedded Automation

Deep Neural Networks (DNNs) have begun to thrive in the field of automat...
research
06/01/2022

Rotate the ReLU to implicitly sparsify deep networks

In the era of Deep Neural Network based solutions for a variety of real-...
research
05/22/2018

ARiA: Utilizing Richard's Curve for Controlling the Non-monotonicity of the Activation Function in Deep Neural Nets

This work introduces a novel activation unit that can be efficiently emp...
research
10/23/2020

Population Gradients improve performance across data-sets and architectures in object classification

The most successful methods such as ReLU transfer functions, batch norma...

Please sign up or login with your details

Forgot password? Click here to reset