HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

07/28/2022
by   Yongming Rao, et al.
10

Recent progress in vision Transformers exhibits great success in various tasks driven by the new spatial modeling mechanism based on dot-product self-attention. In this paper, we show that the key ingredients behind the vision Transformers, namely input-adaptive, long-range and high-order spatial interactions, can also be efficiently implemented with a convolution-based framework. We present the Recursive Gated Convolution (g^nConv) that performs high-order spatial interactions with gated convolutions and recursive designs. The new operation is highly flexible and customizable, which is compatible with various variants of convolution and extends the two-order interactions in self-attention to arbitrary orders without introducing significant extra computation. g^nConv can serve as a plug-and-play module to improve various vision Transformers and convolution-based models. Based on the operation, we construct a new family of generic vision backbones named HorNet. Extensive experiments on ImageNet classification, COCO object detection and ADE20K semantic segmentation show HorNet outperform Swin Transformers and ConvNeXt by a significant margin with similar overall architecture and training configurations. HorNet also shows favorable scalability to more training data and a larger model size. Apart from the effectiveness in visual encoders, we also show g^nConv can be applied to task-specific decoders and consistently improve dense prediction performance with less computation. Our results demonstrate that g^nConv can be a new basic module for visual modeling that effectively combines the merits of both vision Transformers and CNNs. Code is available at https://github.com/raoyongming/HorNet

READ FULL TEXT
research
12/23/2022

A Close Look at Spatial Modeling: From Attention to Convolution

Vision Transformers have shown great promise recently for many vision ta...
research
07/01/2022

Rethinking Query-Key Pairwise Interactions in Vision Transformers

Vision Transformers have achieved state-of-the-art performance in many v...
research
11/09/2021

Sliced Recursive Transformer

We present a neat yet effective recursive operation on vision transforme...
research
06/04/2021

X-volution: On the unification of convolution and self-attention

Convolution and self-attention are acting as two fundamental building bl...
research
11/18/2021

TransMix: Attend to Mix for Vision Transformers

Mixup-based augmentation has been found to be effective for generalizing...
research
11/22/2022

Efficient Frequency Domain-based Transformers for High-Quality Image Deblurring

We present an effective and efficient method that explores the propertie...
research
11/06/2021

Convolutional Gated MLP: Combining Convolutions gMLP

To the best of our knowledge, this is the first paper to introduce Convo...

Please sign up or login with your details

Forgot password? Click here to reset