Activation function design for deep networks: linearity and effective initialisation

05/17/2021
by   Michael Murray, et al.
80

The activation function deployed in a deep neural network has great influence on the performance of the network at initialisation, which in turn has implications for training. In this paper we study how to avoid two problems at initialisation identified in prior works: rapid convergence of pairwise input correlations, and vanishing and exploding gradients. We prove that both these problems can be avoided by choosing an activation function possessing a sufficiently large linear region around the origin, relative to the bias variance σ_b^2 of the network's random initialisation. We demonstrate empirically that using such activation functions leads to tangible benefits in practice, both in terms test and training accuracy as well as training time. Furthermore, we observe that the shape of the nonlinear activation outside the linear region appears to have a relatively limited impact on training. Finally, our results also allow us to train networks in a new hyperparameter regime, with a much larger bias variance than has previously been possible.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

page 9

page 13

page 15

page 16

page 17

page 33

09/03/2018

PLU: The Piecewise Linear Unit Activation Function

Successive linear transforms followed by nonlinear "activation" function...
05/21/2018

On the Selection of Initialization and Activation Function for Deep Neural Networks

The weight initialization and the activation function of deep neural net...
11/27/2021

Why MDAC? A Multi-domain Activation Function

In this study, a novel, general and ingenious activation function termed...
11/27/2021

AIS: A nonlinear activation function for industrial safety engineering

In the task of Chinese named entity recognition based on deep learning, ...
08/06/2020

The nlogistic-sigmoid function

The variants of the logistic-sigmoid functions used in artificial neural...
09/21/2020

Reservoir Computing and its Sensitivity to Symmetry in the Activation Function

Reservoir computing has repeatedly been shown to be extremely successful...
01/09/2019

A Constructive Approach for One-Shot Training of Neural Networks Using Hypercube-Based Topological Coverings

In this paper we presented a novel constructive approach for training de...

Code Repositories

AFLI

Activation function design: Linearity and effective initialization


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.