AReLU: Attention-based Rectified Linear Unit

06/24/2020
by   Dengsheng Chen, et al.
0

Element-wise activation functions play a critical role in deep neural networks by affecting the expressivity power and the learning dynamics. Learning-based activation functions have recently gained increasing attention and success. We propose a new perspective of learnable activation function through formulating them with element-wise attention mechanism. In each network layer, we devise an attention module which learns an element-wise, sign-based attention map for the pre-activation feature map. The attention map scales an element based on its sign. Adding the attention module with a rectified linear unit (ReLU) results in an amplification of positive elements and a suppression of negative ones, both with learned, data-adaptive parameters. We coin the resulting activation function Attention-based Rectified Linear Unit (AReLU). The attention module essentially learns an element-wise residue of the activated part of the input, as ReLU can be viewed as an identity transformation. This makes the network training more resistant to gradient vanishing. The learned attentive activation leads to well-focused activation of relevant regions of a feature map. Through extensive evaluations, we show that AReLU significantly boosts the performance of most mainstream network architectures with only two extra learnable parameters per layer introduced. Notably, AReLU facilitates fast network training under small learning rates, which makes it especially suited in the case of transfer learning. Our source code has been released (https://github.com/densechen/AReLU).

READ FULL TEXT

page 2

page 8

page 9

page 16

research
10/03/2018

Weighted Sigmoid Gate Unit for an Activation Function of Deep Neural Network

An activation function has crucial role in a deep neural network. A si...
research
05/03/2022

Attentive activation function for improving end-to-end spoofing countermeasure systems

The main objective of the spoofing countermeasure system is to detect th...
research
08/25/2023

Linear Oscillation: The Aesthetics of Confusion for Vision Transformer

Activation functions are the linchpins of deep learning, profoundly infl...
research
11/29/2021

First Power Linear Unit with Sign

This paper proposes a novel and insightful activation method termed FPLU...
research
05/22/2018

Breaking the Activation Function Bottleneck through Adaptive Parameterization

Standard neural network architectures are non-linear only by virtue of a...
research
01/25/2021

Parametric Rectified Power Sigmoid Units: Learning Nonlinear Neural Transfer Analytical Forms

The paper proposes representation functionals in a dual paradigm where l...

Please sign up or login with your details

Forgot password? Click here to reset