ParCNetV2: Oversized Kernel with Enhanced Attention

11/14/2022
by   Ruihan Xu, et al.
0

Transformers have achieved tremendous success in various computer vision tasks. By borrowing design concepts from transformers, many studies revolutionized CNNs and showed remarkable results. This paper falls in this line of studies. More specifically, we introduce a convolutional neural network architecture named ParCNetV2, which extends position-aware circular convolution (ParCNet) with oversized convolutions and strengthens attention through bifurcate gate units. The oversized convolution utilizes a kernel with 2× the input size to model long-range dependencies through a global receptive field. Simultaneously, it achieves implicit positional encoding by removing the shift-invariant property from convolutional kernels, i.e., the effective kernels at different spatial locations are different when the kernel size is twice as large as the input size. The bifurcate gate unit implements an attention mechanism similar to self-attention in transformers. It splits the input into two branches, one serves as feature transformation while the other serves as attention weights. The attention is applied through element-wise multiplication of the two branches. Besides, we introduce a unified local-global convolution block to unify the design of the early and late stage convolutional blocks. Extensive experiments demonstrate that our method outperforms other pure convolutional neural networks as well as neural networks hybridizing CNNs and transformers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/22/2021

Transformers Solve the Limited Receptive Field for Monocular Depth Prediction

While convolutional neural networks have shown a tremendous impact on va...
research
10/08/2022

Fast-ParC: Position Aware Global Kernel for ConvNets and ViTs

Transformer models have made tremendous progress in various fields in re...
research
11/02/2021

Can Vision Transformers Perform Convolution?

Several recent studies have demonstrated that attention-based networks, ...
research
07/07/2022

More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity

Transformers have quickly shined in the computer vision world since the ...
research
06/21/2022

Scaling up Kernels in 3D CNNs

Recent advances in 2D CNNs and vision transformers (ViTs) reveal that la...
research
04/13/2021

Co-Scale Conv-Attentional Image Transformers

In this paper, we present Co-scale conv-attentional image Transformers (...
research
07/26/2023

Adaptive Frequency Filters As Efficient Global Token Mixers

Recent vision transformers, large-kernel CNNs and MLPs have attained rem...

Please sign up or login with your details

Forgot password? Click here to reset