Activation function design for deep networks: linearity and effective initialisation

by   Michael Murray, et al.

The activation function deployed in a deep neural network has great influence on the performance of the network at initialisation, which in turn has implications for training. In this paper we study how to avoid two problems at initialisation identified in prior works: rapid convergence of pairwise input correlations, and vanishing and exploding gradients. We prove that both these problems can be avoided by choosing an activation function possessing a sufficiently large linear region around the origin, relative to the bias variance σ_b^2 of the network's random initialisation. We demonstrate empirically that using such activation functions leads to tangible benefits in practice, both in terms test and training accuracy as well as training time. Furthermore, we observe that the shape of the nonlinear activation outside the linear region appears to have a relatively limited impact on training. Finally, our results also allow us to train networks in a new hyperparameter regime, with a much larger bias variance than has previously been possible.



There are no comments yet.


page 5

page 9

page 13

page 15

page 16

page 17

page 33


PLU: The Piecewise Linear Unit Activation Function

Successive linear transforms followed by nonlinear "activation" function...

On the Selection of Initialization and Activation Function for Deep Neural Networks

The weight initialization and the activation function of deep neural net...

Why MDAC? A Multi-domain Activation Function

In this study, a novel, general and ingenious activation function termed...

AIS: A nonlinear activation function for industrial safety engineering

In the task of Chinese named entity recognition based on deep learning, ...

The nlogistic-sigmoid function

The variants of the logistic-sigmoid functions used in artificial neural...

Reservoir Computing and its Sensitivity to Symmetry in the Activation Function

Reservoir computing has repeatedly been shown to be extremely successful...

A Constructive Approach for One-Shot Training of Neural Networks Using Hypercube-Based Topological Coverings

In this paper we presented a novel constructive approach for training de...

Code Repositories


Activation function design: Linearity and effective initialization

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.