Accumulated Trivial Attention Matters in Vision Transformers on Small Datasets

10/22/2022
by   Xiangyu Chen, et al.
0

Vision Transformers has demonstrated competitive performance on computer vision tasks benefiting from their ability to capture long-range dependencies with multi-head self-attention modules and multi-layer perceptron. However, calculating global attention brings another disadvantage compared with convolutional neural networks, i.e. requiring much more data and computations to converge, which makes it difficult to generalize well on small datasets, which is common in practical applications. Previous works are either focusing on transferring knowledge from large datasets or adjusting the structure for small datasets. After carefully examining the self-attention modules, we discover that the number of trivial attention weights is far greater than the important ones and the accumulated trivial weights are dominating the attention in Vision Transformers due to their large quantity, which is not handled by the attention itself. This will cover useful non-trivial attention and harm the performance when trivial attention includes more noise, e.g. in shallow layers for some backbones. To solve this issue, we proposed to divide attention weights into trivial and non-trivial ones by thresholds, then Suppressing Accumulated Trivial Attention (SATA) weights by proposed Trivial WeIghts Suppression Transformation (TWIST) to reduce attention noise. Extensive experiments on CIFAR-100 and Tiny-ImageNet datasets show that our suppressing method boosts the accuracy of Vision Transformers by up to 2.3 available at https://github.com/xiangyu8/SATA.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2022

Explicitly Increasing Input Information Density for Vision Transformers on Small Datasets

Vision Transformers have attracted a lot of attention recently since the...
research
02/14/2022

How Do Vision Transformers Work?

The success of multi-head self-attentions (MSAs) for computer vision is ...
research
03/22/2021

Transformers Solve the Limited Receptive Field for Monocular Depth Prediction

While convolutional neural networks have shown a tremendous impact on va...
research
12/23/2022

A Close Look at Spatial Modeling: From Attention to Convolution

Vision Transformers have shown great promise recently for many vision ta...
research
05/25/2023

Making Vision Transformers Truly Shift-Equivariant

For computer vision tasks, Vision Transformers (ViTs) have become one of...
research
09/11/2023

CNN or ViT? Revisiting Vision Transformers Through the Lens of Convolution

The success of Vision Transformer (ViT) has been widely reported on a wi...
research
05/09/2023

LSAS: Lightweight Sub-attention Strategy for Alleviating Attention Bias Problem

In computer vision, the performance of deep neural networks (DNNs) is hi...

Please sign up or login with your details

Forgot password? Click here to reset