Normalized Activation Function: Toward Better Convergence

08/29/2022
by   Yuan Peiwen, et al.
0

Activation functions are essential for neural networks to introduce non-linearity. A great number of empirical experiments have validated various activation functions, yet theoretical research on activation functions are insufficient. In this work, we study the impact of activation functions on the variance of gradients and propose an approach to normalize activation functions to keep the variance of the gradient same for all layers so that the neural network can achieve better convergence. First, we complement the previous work on the analysis of the variance of gradients where the impact of activation functions are just considered in an idealized initial state which almost cannot be preserved during training and obtained a property that good activation functions should satisfy as possible. Second, we offer an approach to normalize activation functions and testify its effectiveness on prevalent activation functions empirically. And by observing experiments, we discover that the speed of convergence is roughly related to the property we derived in the former part. We run experiments of our normalized activation functions against common activation functions. And the result shows our approach consistently outperforms their unnormalized counterparts. For example, normalized Swish outperforms vanilla Swish by 1.2 accuracy. Our method improves the performance by simply replacing activation functions with their normalized ones in both fully-connected networks and residual networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2020

Evolutionary Optimization of Deep Learning Activation Functions

The choice of activation function can have a large effect on the perform...
research
05/22/2019

Effect of shapes of activation functions on predictability in the echo state network

We investigate prediction accuracy for time series of Echo state network...
research
03/01/2016

Noisy Activation Functions

Common nonlinear activation functions used in neural networks can cause ...
research
03/19/2022

Efficient Neural Network Analysis with Sum-of-Infeasibilities

Inspired by sum-of-infeasibilities methods in convex optimization, we pr...
research
03/25/2023

A Desynchronization-Based Countermeasure Against Side-Channel Analysis of Neural Networks

Model extraction attacks have been widely applied, which can normally be...
research
06/01/2019

Evolution of Novel Activation Functions in Neural Network Training with Applications to Classification of Exoplanets

We present analytical exploration of novel activation functions as conse...
research
05/30/2021

Evolution of Activation Functions: An Empirical Investigation

The hyper-parameters of a neural network are traditionally designed thro...

Please sign up or login with your details

Forgot password? Click here to reset