Shifting Mean Activation Towards Zero with Bipolar Activation Functions

09/12/2017
by   Lars Eidnes, et al.
0

We propose a simple extension to the ReLU-family of activation functions that allows them to shift the mean activation across a layer towards zero. Combined with proper weight initialization, this alleviates the need for normalization layers. We explore the training of deep vanilla recurrent neural networks (RNNs) with up to 144 layers, and show that bipolar activation functions help learning in this setting. On the Penn Treebank and Text8 language modeling tasks we obtain competitive results, improving on the best reported results for non-gated networks.

READ FULL TEXT
research
09/28/2020

EIS – a family of activation functions combining Exponential, ISRU, and Softplus

Activation functions play a pivotal role in the function learning using ...
research
06/08/2023

Layer-level activation mechanism

In this work, we propose a novel activation mechanism aimed at establish...
research
01/21/2021

Characterizing signal propagation to close the performance gap in unnormalized ResNets

Batch Normalization is a key component in almost all state-of-the-art im...
research
04/06/2020

Evolving Normalization-Activation Layers

Normalization layers and activation functions are critical components in...
research
11/03/2015

Detecting Interrogative Utterances with Recurrent Neural Networks

In this paper, we explore different neural network architectures that ca...
research
03/15/2022

Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers

Training very deep neural networks is still an extremely challenging tas...
research
06/14/2019

On the Computational Power of RNNs

Recent neural network architectures such as the basic recurrent neural n...

Please sign up or login with your details

Forgot password? Click here to reset