Attention as Activation

07/15/2020
by   Yimian Dai, et al.
0

Activation functions and attention mechanisms are typically treated as having different purposes and have evolved differently. However, both concepts can be formulated as a non-linear gating function. Inspired by their similarity, we propose a novel type of activation units called attentional activation (ATAC) units as a unification of activation functions and attention mechanisms. In particular, we propose a local channel attention module for the simultaneous non-linear activation and element-wise feature refinement, which locally aggregates point-wise cross-channel feature contexts. By replacing the well-known rectified linear units by such ATAC units in convolutional networks, we can construct fully attentional networks that perform significantly better with a modest number of additional parameters. We conducted detailed ablation studies on the ATAC units using several host networks with varying network depths to empirically verify the effectiveness and efficiency of the units. Furthermore, we compared the performance of the ATAC units against existing activation functions as well as other attention mechanisms on the CIFAR-10, CIFAR-100, and ImageNet datasets. Our experimental results show that networks constructed with the proposed ATAC units generally yield performance gains over their competitors given a comparable number of parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/08/2020

TanhSoft – a family of activation functions combining Tanh and Softplus

Deep learning at its core, contains functions that are composition of a ...
research
06/16/2020

SPLASH: Learnable Activation Functions for Improving Accuracy and Adversarial Robustness

We introduce SPLASH units, a class of learnable activation functions sho...
research
04/08/2018

Comparison of non-linear activation functions for deep neural networks on MNIST classification task

Activation functions play a key role in neural networks so it becomes fu...
research
08/17/2022

Restructurable Activation Networks

Is it possible to restructure the non-linear activation functions in a d...
research
12/22/2015

Deep Learning with S-shaped Rectified Linear Activation Units

Rectified linear activation units are important components for state-of-...
research
03/16/2016

Suppressing the Unusual: towards Robust CNNs using Symmetric Activation Functions

Many deep Convolutional Neural Networks (CNN) make incorrect predictions...
research
11/27/2018

Dense xUnit Networks

Deep net architectures have constantly evolved over the past few years, ...

Please sign up or login with your details

Forgot password? Click here to reset