Large Separable Kernel Attention: Rethinking the Large Kernel Attention Design in CNN

09/04/2023
by   Kin Wai Lau, et al.
0

Visual Attention Networks (VAN) with Large Kernel Attention (LKA) modules have been shown to provide remarkable performance, that surpasses Vision Transformers (ViTs), on a range of vision-based tasks. However, the depth-wise convolutional layer in these LKA modules incurs a quadratic increase in the computational and memory footprints with increasing convolutional kernel size. To mitigate these problems and to enable the use of extremely large convolutional kernels in the attention modules of VAN, we propose a family of Large Separable Kernel Attention modules, termed LSKA. LSKA decomposes the 2D convolutional kernel of the depth-wise convolutional layer into cascaded horizontal and vertical 1-D kernels. In contrast to the standard LKA design, the proposed decomposition enables the direct use of the depth-wise convolutional layer with large kernels in the attention module, without requiring any extra blocks. We demonstrate that the proposed LSKA module in VAN can achieve comparable performance with the standard LKA module and incur lower computational complexity and memory footprints. We also find that the proposed LSKA design biases the VAN more toward the shape of the object than the texture with increasing kernel size. Additionally, we benchmark the robustness of the LKA and LSKA in VAN, ViTs, and the recent ConvNeXt on the five corrupted versions of the ImageNet dataset that are largely unexplored in the previous works. Our extensive experimental results show that the proposed LSKA module in VAN provides a significant reduction in computational complexity and memory footprints with increasing kernel size while outperforming ViTs, ConvNeXt, and providing similar performance compared to the LKA module in VAN on object recognition, object detection, semantic segmentation, and robustness tests.

READ FULL TEXT

page 1

page 6

page 14

research
06/21/2022

Scaling up Kernels in 3D CNNs

Recent advances in 2D CNNs and vision transformers (ViTs) reveal that la...
research
03/13/2022

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs

We revisit large kernel design in modern convolutional neural networks (...
research
04/19/2023

SLIC: Self-Conditioned Adaptive Transform with Large-Scale Receptive Fields for Learned Image Compression

Learned image compression has achieved remarkable performance. Transform...
research
09/05/2022

LKD-Net: Large Kernel Convolution Network for Single Image Dehazing

The deep convolutional neural networks (CNNs)-based single image dehazin...
research
11/22/2022

Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition

This paper does not attempt to design a state-of-the-art method for visu...
research
08/17/2022

KAM – a Kernel Attention Module for Emotion Classification with EEG Data

In this work, a kernel attention module is presented for the task of EEG...
research
07/18/2022

Fully trainable Gaussian derivative convolutional layer

The Gaussian kernel and its derivatives have already been employed for C...

Please sign up or login with your details

Forgot password? Click here to reset