On the Integration of Self-Attention and Convolution

11/29/2021
by   Xuran Pan, et al.
0

Convolution and self-attention are two powerful techniques for representation learning, and they are usually considered as two peer approaches that are distinct from each other. In this paper, we show that there exists a strong underlying relation between them, in the sense that the bulk of computations of these two paradigms are in fact done with the same operation. Specifically, we first show that a traditional convolution with kernel size k x k can be decomposed into k^2 individual 1x1 convolutions, followed by shift and summation operations. Then, we interpret the projections of queries, keys, and values in self-attention module as multiple 1x1 convolutions, followed by the computation of attention weights and aggregation of the values. Therefore, the first stage of both two modules comprises the similar operation. More importantly, the first stage contributes a dominant computation complexity (square of the channel size) comparing to the second stage. This observation naturally leads to an elegant integration of these two seemingly distinct paradigms, i.e., a mixed model that enjoys the benefit of both self-Attention and Convolution (ACmix), while having minimum computational overhead compared to the pure convolution or self-attention counterpart. Extensive experiments show that our model achieves consistently improved results over competitive baselines on image recognition and downstream tasks. Code and pre-trained models will be released at https://github.com/Panxuran/ACmix and https://gitee.com/mindspore/models.

READ FULL TEXT

page 1

page 4

page 6

page 7

page 8

page 9

page 10

page 12

research
02/14/2020

Electricity Theft Detection with self-attention

In this work we propose a novel self-attention mechanism model to addres...
research
04/18/2020

Adaptive Attention Span in Computer Vision

Recent developments in Transformers for language modeling have opened ne...
research
04/28/2020

Exploring Self-attention for Image Recognition

Recent work has shown that self-attention can serve as a basic building ...
research
08/06/2020

ConvBERT: Improving BERT with Span-based Dynamic Convolution

Pre-trained language models like BERT and its variants have recently ach...
research
06/10/2021

Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models

In this paper, we detail the relationship between convolutions and self-...
research
07/12/2021

Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms

Self-Attention has become prevalent in computer vision models. Inspired ...

Please sign up or login with your details

Forgot password? Click here to reset