Soft-Root-Sign Activation Function

03/01/2020
by   Yuan Zhou, et al.
0

The choice of activation function in deep networks has a significant effect on the training dynamics and task performance. At present, the most effective and widely-used activation function is ReLU. However, because of the non-zero mean, negative missing and unbounded output, ReLU is at a potential disadvantage during optimization. To this end, we introduce a novel activation function to manage to overcome the above three challenges. The proposed nonlinearity, namely "Soft-Root-Sign" (SRS), is smooth, non-monotonic, and bounded. Notably, the bounded property of SRS distinguishes itself from most state-of-the-art activation functions. In contrast to ReLU, SRS can adaptively adjust the output by a pair of independent trainable parameters to capture negative information and provide zero-mean property, which leading not only to better generalization performance, but also to faster learning speed. It also avoids and rectifies the output distribution to be scattered in the non-negative real number space, making it more compatible with batch normalization (BN) and less sensitive to initialization. In experiments, we evaluated SRS on deep networks applied to a variety of tasks, including image classification, machine translation and generative modelling. Our SRS matches or exceeds models with ReLU and other state-of-the-art nonlinearities, showing that the proposed activation function is generalized and can achieve high performance across tasks. Ablation study further verified the compatibility with BN and self-adaptability for different initialization.

READ FULL TEXT

page 5

page 9

page 11

research
09/09/2021

ErfAct and PSerf: Non-monotonic smooth trainable Activation Functions

An activation function is a crucial component of a neural network that i...
research
08/21/2021

SERF: Towards better training of deep neural networks using log-Softplus ERror activation Function

Activation functions play a pivotal role in determining the training dyn...
research
10/16/2017

Searching for Activation Functions

The choice of activation functions in deep networks has a significant ef...
research
02/02/2019

Self-Binarizing Networks

We present a method to train self-binarizing neural networks, that is, n...
research
11/27/2021

Why KDAC? A general activation function for knowledge discovery

Named entity recognition based on deep learning (DNER) can effectively m...
research
07/26/2018

Effectiveness of Scaled Exponentially-Regularized Linear Units (SERLUs)

Recently, self-normalizing neural networks (SNNs) have been proposed wit...
research
06/18/2022

PHN: Parallel heterogeneous network with soft gating for CTR prediction

The Click-though Rate (CTR) prediction task is a basic task in recommend...

Please sign up or login with your details

Forgot password? Click here to reset