X-volution: On the unification of convolution and self-attention

06/04/2021
by   Xuanhong Chen, et al.
0

Convolution and self-attention are acting as two fundamental building blocks in deep neural networks, where the former extracts local image features in a linear way while the latter non-locally encodes high-order contextual relationships. Though essentially complementary to each other, i.e., first-/high-order, stat-of-the-art architectures, i.e., CNNs or transformers lack a principled way to simultaneously apply both operations in a single computational module, due to their heterogeneous computing pattern and excessive burden of global dot-product for visual tasks. In this work, we theoretically derive a global self-attention approximation scheme, which approximates a self-attention via the convolution operation on transformed features. Based on the approximated scheme, we establish a multi-branch elementary module composed of both convolution and self-attention operation, capable of unifying both local and non-local feature interaction. Importantly, once trained, this multi-branch module could be conditionally converted into a single standard convolution operation via structural re-parameterization, rendering a pure convolution styled operator named X-volution, ready to be plugged into any modern networks as an atomic operation. Extensive experiments demonstrate that the proposed X-volution, achieves highly competitive visual understanding improvements (+1.2 +1.7 box AP and +1.5 mask AP on COCO detection and segmentation).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/22/2023

Simple parameter-free self-attention approximation

The hybrid model of self-attention and convolution is one of the methods...
research
07/28/2022

HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

Recent progress in vision Transformers exhibits great success in various...
research
01/24/2022

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

It is a challenging task to learn discriminative representation from ima...
research
11/15/2021

Searching for TrioNet: Combining Convolution with Local and Global Self-Attention

Recently, self-attention operators have shown superior performance as a ...
research
02/24/2023

Spatial Bias for Attention-free Non-local Neural Networks

In this paper, we introduce the spatial bias to learn global knowledge w...
research
06/12/2021

Structure-Regularized Attention for Deformable Object Representation

Capturing contextual dependencies has proven useful to improve the repre...
research
05/20/2019

Less Memory, Faster Speed: Refining Self-Attention Module for Image Reconstruction

Self-attention (SA) mechanisms can capture effectively global dependenci...

Please sign up or login with your details

Forgot password? Click here to reset