Static Activation Function Normalization

05/03/2019
by   Pierre H. Richemond, et al.
0

Recent seminal work at the intersection of deep neural networks practice and random matrix theory has linked the convergence speed and robustness of these networks with the combination of random weight initialization and nonlinear activation function in use. Building on those principles, we introduce a process to transform an existing activation function into another one with better properties. We term such transform static activation normalization. More specifically we focus on this normalization applied to the ReLU unit, and show empirically that it significantly promotes convergence robustness, maximum training depth, and anytime performance. We verify these claims by examining empirical eigenvalue distributions of networks trained with those activations. Our static activation normalization provides a first step towards giving benefits similar in spirit to schemes like batch normalization, but without computational cost.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2018

On the Selection of Initialization and Activation Function for Deep Neural Networks

The weight initialization and the activation function of deep neural net...
research
06/11/2020

Optimization Theory for ReLU Neural Networks Trained with Normalization Layers

The success of deep neural networks is in part due to the use of normali...
research
09/04/2018

Understanding Regularization in Batch Normalization

Batch Normalization (BN) makes output of hidden neuron had zero mean and...
research
05/28/2023

On the impact of activation and normalization in obtaining isometric embeddings at initialization

In this paper, we explore the structure of the penultimate Gram matrix i...
research
02/02/2019

Self-Binarizing Networks

We present a method to train self-binarizing neural networks, that is, n...
research
10/10/2020

Improve the Robustness and Accuracy of Deep Neural Network with L_2,∞ Normalization

In this paper, the robustness and accuracy of the deep neural network (D...
research
07/06/2016

A Modified Activation Function with Improved Run-Times For Neural Networks

In this paper we present a modified version of the Hyperbolic Tangent Ac...

Please sign up or login with your details

Forgot password? Click here to reset