SERF: Towards better training of deep neural networks using log-Softplus ERror activation Function

08/21/2021
by   Sayan Nag, et al.
0

Activation functions play a pivotal role in determining the training dynamics and neural network performance. The widely adopted activation function ReLU despite being simple and effective has few disadvantages including the Dying ReLU problem. In order to tackle such problems, we propose a novel activation function called Serf which is self-regularized and nonmonotonic in nature. Like Mish, Serf also belongs to the Swish family of functions. Based on several experiments on computer vision (image classification and object detection) and natural language processing (machine translation, sentiment classification and multimodal entailment) tasks with different state-of-the-art architectures, it is observed that Serf vastly outperforms ReLU (baseline) and other activation functions including both Swish and Mish, with a markedly bigger margin on deeper architectures. Ablation studies further demonstrate that Serf based architectures perform better than those of Swish and Mish in varying scenarios, validating the effectiveness and compatibility of Serf with varying depth, complexity, optimizers, learning rates, batch sizes, initializers and dropout rates. Finally, we investigate the mathematical relation between Swish and Serf, thereby showing the impact of preconditioner function ingrained in the first derivative of Serf which provides a regularization effect making gradients smoother and optimization faster.

READ FULL TEXT
research
09/10/2022

APTx: better activation function than MISH, SWISH, and ReLU's variants used in deep learning

Activation Functions introduce non-linearity in the deep neural networks...
research
08/23/2019

Mish: A Self Regularized Non-Monotonic Neural Activation Function

The concept of non-linearity in a Neural Network is introduced by an act...
research
03/01/2020

Soft-Root-Sign Activation Function

The choice of activation function in deep networks has a significant eff...
research
03/05/2023

Swim: A General-Purpose, High-Performing, and Efficient Activation Function for Locomotion Control Tasks

Activation functions play a significant role in the performance of deep ...
research
06/14/2021

English to Bangla Machine Translation Using Recurrent Neural Network

The applications of recurrent neural networks in machine translation are...
research
08/25/2023

Linear Oscillation: The Aesthetics of Confusion for Vision Transformer

Activation functions are the linchpins of deep learning, profoundly infl...
research
04/08/2016

Norm-preserving Orthogonal Permutation Linear Unit Activation Functions (OPLU)

We propose a novel activation function that implements piece-wise orthog...

Please sign up or login with your details

Forgot password? Click here to reset