A New Activation Function for Training Deep Neural Networks to Avoid Local Minimum
Activation functions have a major role to play and hence are very important while training neural networks. Understanding various activation functions and their advantages and disadvantages is crucial to achieve better results. This paper will first introduce common types of non linear activation functions and then evaluate their characteristics with their pros and cons. We will be focussing only on deep neural networks as it has proven to be much more difficult to train while avoiding overfitting. We have proposed a new activation function named - Abhinav which adds a non linearity with parametric coefficients to the Swish activation function. This has proven to give better results on the MNIST dataset. We reason this is because the model avoids getting stuck in local minima due to only the sigmoidal term present in the Swish function. The coefficients are automatically adjusted in each and every iteration i.e. coefficient values are reduced if the error is large and also sometimes reducing it to zero thus removing coefficients present in the polynomial. This new activation function can be made to generalize to other tasks including multi class and multi label image classification also.
READ FULL TEXT