On the Selection of Initialization and Activation Function for Deep Neural Networks

05/21/2018
by   Soufiane Hayou, et al.
0

The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the learning procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is key to identifying which deep networks may be trained successfully as recently demonstrated by Schoenholz et al. (2017) who showed that for deep feedforward neural networks only a specific choice of hyperparameters known as the `edge of chaos' can lead to good performance. We complete these recent results by providing quantitative results showing that, for a class of ReLU-like activation functions, the information propagates indeed deeper when the network is initialized at the edge of chaos. By extending our analysis to a larger class of functions, we then identify an activation function, ϕ_new(x) = x ·sigmoid(x), which improves the information propagation over ReLU-like functions and does not suffer from the vanishing gradient problem. We demonstrate empirically that this activation function combined to a random initialization on the edge of chaos outperforms standard approaches. This complements recent independent work by Ramachandran et al. (2017) who have observed empirically in extensive simulations that this activation function performs better than many alternatives.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2019

On the Impact of the Activation Function on Deep Neural Networks Training

The weight initialization and the activation function of deep neural net...
research
05/03/2019

Static Activation Function Normalization

Recent seminal work at the intersection of deep neural networks practice...
research
08/15/2018

Collapse of Deep and Narrow Neural Nets

Recent theoretical work has demonstrated that deep neural networks have ...
research
05/25/2021

Towards Understanding the Condensation of Two-layer Neural Networks at Initial Training

It is important to study what implicit regularization is imposed on the ...
research
12/24/2017

Mean Field Residual Networks: On the Edge of Chaos

We study randomly initialized residual networks using mean field theory ...
research
08/06/2020

The nlogistic-sigmoid function

The variants of the logistic-sigmoid functions used in artificial neural...
research
08/16/2019

Effect of Activation Functions on the Training of Overparametrized Neural Nets

It is well-known that overparametrized neural networks trained using gra...

Please sign up or login with your details

Forgot password? Click here to reset