Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning

12/15/2018
by   Hock Hung Chieng, et al.
0

Activation functions are essential for deep learning methods to learn and perform complex tasks such as image classification. Rectified Linear Unit (ReLU) has been widely used and become the default activation function across the deep learning community since 2012. Although ReLU has been popular, however, the hard zero property of the ReLU has heavily hindered the negative values from propagating through the network. Consequently, the deep neural network has not been benefited from the negative representations. In this work, an activation function called Flatten-T Swish (FTS) that leverage the benefit of the negative values is proposed. To verify its performance, this study evaluates FTS with ReLU and several recent activation functions. Each activation function is trained using MNIST dataset on five different deep fully connected neural networks (DFNNs) with depth vary from five to eight layers. For a fair evaluation, all DFNNs are using the same configuration settings. Based on the experimental results, FTS with a threshold value, T=-0.20 has the best overall performance. As compared with ReLU, FTS (T=-0.20) improves MNIST classification accuracy by 0.13 layers, slimmer 5 layers, 6 layers, 7 layers and 8 layers DFNNs respectively. Apart from this, the study also noticed that FTS converges twice as fast as ReLU. Although there are other existing activation functions are also evaluated, this study elects ReLU as the baseline activation function.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2022

Activation Functions: Dive into an optimal activation function

Activation functions have come up as one of the essential components of ...
research
08/10/2019

Natural-Logarithm-Rectified Activation Function in Convolutional Neural Networks

Activation functions play a key role in providing remarkable performance...
research
10/17/2022

Nish: A Novel Negative Stimulated Hybrid Activation Function

An activation function has a significant impact on the efficiency and ro...
research
11/06/2020

Parametric Flatten-T Swish: An Adaptive Non-linear Activation Function For Deep Learning

Activation function is a key component in deep learning that performs no...
research
06/22/2022

Concentration inequalities and optimal number of layers for stochastic deep neural networks

We state concentration and martingale inequalities for the output of the...
research
01/15/2022

Phish: A Novel Hyper-Optimizable Activation Function

Deep-learning models estimate values using backpropagation. The activati...
research
01/09/2019

Is it Time to Swish? Comparing Deep Learning Activation Functions Across NLP tasks

Activation functions play a crucial role in neural networks because they...

Please sign up or login with your details

Forgot password? Click here to reset