Activation function design for deep networks: linearity and effective initialisation

05/17/2021
by   Michael Murray, et al.
80

The activation function deployed in a deep neural network has great influence on the performance of the network at initialisation, which in turn has implications for training. In this paper we study how to avoid two problems at initialisation identified in prior works: rapid convergence of pairwise input correlations, and vanishing and exploding gradients. We prove that both these problems can be avoided by choosing an activation function possessing a sufficiently large linear region around the origin, relative to the bias variance σ_b^2 of the network's random initialisation. We demonstrate empirically that using such activation functions leads to tangible benefits in practice, both in terms test and training accuracy as well as training time. Furthermore, we observe that the shape of the nonlinear activation outside the linear region appears to have a relatively limited impact on training. Finally, our results also allow us to train networks in a new hyperparameter regime, with a much larger bias variance than has previously been possible.

READ FULL TEXT

page 5

page 9

page 13

page 15

page 16

page 17

page 33

research
02/27/2023

Moderate Adaptive Linear Units (MoLU)

We propose a new high-performance activation function, Moderate Adaptive...
research
09/03/2018

PLU: The Piecewise Linear Unit Activation Function

Successive linear transforms followed by nonlinear "activation" function...
research
11/27/2021

AIS: A nonlinear activation function for industrial safety engineering

In the task of Chinese named entity recognition based on deep learning, ...
research
06/26/2018

Gradient Acceleration in Activation Functions

Dropout has been one of standard approaches to train deep neural network...
research
09/21/2020

Reservoir Computing and its Sensitivity to Symmetry in the Activation Function

Reservoir computing has repeatedly been shown to be extremely successful...
research
11/30/2022

Average Path Length: Sparsification of Nonlinearties Creates Surprisingly Shallow Networks

We perform an empirical study of the behaviour of deep networks when pus...
research
08/06/2020

The nlogistic-sigmoid function

The variants of the logistic-sigmoid functions used in artificial neural...

Please sign up or login with your details

Forgot password? Click here to reset