Adaptive Attention Link-based Regularization for Vision Transformers

11/25/2022
by   Heegon Jin, et al.
0

Although transformer networks are recently employed in various vision tasks with outperforming performance, extensive training data and a lengthy training time are required to train a model to disregard an inductive bias. Using trainable links between the channel-wise spatial attention of a pre-trained Convolutional Neural Network (CNN) and the attention head of Vision Transformers (ViT), we present a regularization technique to improve the training efficiency of ViT. The trainable links are referred to as the attention augmentation module, which is trained simultaneously with ViT, boosting the training of ViT and allowing it to avoid the overfitting issue caused by a lack of data. From the trained attention augmentation module, we can extract the relevant relationship between each CNN activation map and each ViT attention head, and based on this, we also propose an advanced attention augmentation module. Consequently, even with a small amount of data, the suggested method considerably improves the performance of ViT while achieving faster convergence during training.

READ FULL TEXT

page 4

page 11

page 12

research
06/18/2021

How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers

Vision Transformers (ViT) have been shown to attain highly competitive p...
research
10/12/2022

Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets

There still remains an extreme performance gap between Vision Transforme...
research
06/07/2021

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

Transformers have shown great potential in various computer vision tasks...
research
10/25/2022

Explicitly Increasing Input Information Density for Vision Transformers on Small Datasets

Vision Transformers have attracted a lot of attention recently since the...
research
05/15/2023

Enhancing Performance of Vision Transformers on Small Datasets through Local Inductive Bias Incorporation

Vision transformers (ViTs) achieve remarkable performance on large datas...
research
05/16/2023

CB-HVTNet: A channel-boosted hybrid vision transformer network for lymphocyte assessment in histopathological images

Transformers, due to their ability to learn long range dependencies, hav...
research
06/09/2021

CoAtNet: Marrying Convolution and Attention for All Data Sizes

Transformers have attracted increasing interests in computer vision, but...

Please sign up or login with your details

Forgot password? Click here to reset