Activation Adaptation in Neural Networks

01/28/2019
by   Farnoush Farhadi, et al.
0

Many neural network architectures rely on the choice of the activation function for each hidden layer. Given the activation function, the neural network is trained over the bias and the weight parameters. The bias catches the center of the activation, and the weights capture the scale. Here we propose to train the network over a shape parameter as well. This view allows each neuron to tune its own activation function and adapt the neuron curvature towards a better prediction. This modification only adds one further equation to the back-propagation for each neuron. Re-formalizing activation functions as CDF generalizes the class of activation function extensively. We aimed at generalizing an extensive class of activation functions to study: i) skewness and ii) smoothness of activation functions. Here we introduce adaptive Gumbel activation function as a bridge between Gumbel and sigmoid. A similar approach is used to invent a smooth version of ReLU. Our comparison with common activation functions suggests different data representation especially in early neural network layers. This adaptation also provides prediction improvement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2023

Data-aware customization of activation functions reduces neural network error

Activation functions play critical roles in neural networks, yet current...
research
03/04/2023

Lon-eå at SemEval-2023 Task 11: A Comparison of Activation Functions for Soft and Hard Label Prediction

We study the influence of different activation functions in the output l...
research
06/30/2022

Consensus Function from an L_p^q-norm Regularization Term for its Use as Adaptive Activation Functions in Neural Networks

The design of a neural network is usually carried out by defining the nu...
research
09/10/2016

Rectifier Neural Network with a Dual-Pathway Architecture for Image Denoising

Recently deep neural networks based on tanh activation function have sho...
research
05/18/2016

Learning activation functions from data using cubic spline interpolation

Neural networks require a careful design in order to perform properly on...
research
03/29/2023

An Over-parameterized Exponential Regression

Over the past few years, there has been a significant amount of research...
research
09/15/2023

Attention-Only Transformers and Implementing MLPs with Attention Heads

The transformer architecture is widely used in machine learning models a...

Please sign up or login with your details

Forgot password? Click here to reset