Explicitly Increasing Input Information Density for Vision Transformers on Small Datasets

10/25/2022
by   Xiangyu Chen, et al.
0

Vision Transformers have attracted a lot of attention recently since the successful implementation of Vision Transformer (ViT) on vision tasks. With vision Transformers, specifically the multi-head self-attention modules, networks can capture long-term dependencies inherently. However, these attention modules normally need to be trained on large datasets, and vision Transformers show inferior performance on small datasets when training from scratch compared with widely dominant backbones like ResNets. Note that the Transformer model was first proposed for natural language processing, which carries denser information than natural images. To boost the performance of vision Transformers on small datasets, this paper proposes to explicitly increase the input information density in the frequency domain. Specifically, we introduce selecting channels by calculating the channel-wise heatmaps in the frequency domain using Discrete Cosine Transform (DCT), reducing the size of input while keeping most information and hence increasing the information density. As a result, 25 achieved compared with previous work. Extensive experiments demonstrate the effectiveness of the proposed approach on five small-scale datasets, including CIFAR-10/100, SVHN, Flowers-102, and Tiny ImageNet. The accuracy has been boosted up to 17.05 https://github.com/xiangyu8/DenseVT.

READ FULL TEXT

page 3

page 4

page 5

page 13

page 16

research
10/22/2022

Accumulated Trivial Attention Matters in Vision Transformers on Small Datasets

Vision Transformers has demonstrated competitive performance on computer...
research
06/07/2023

2D Object Detection with Transformers: A Review

Astounding performance of Transformers in natural language processing (N...
research
10/12/2022

Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers

This paper studies the curious phenomenon for machine learning models wi...
research
03/17/2022

SepTr: Separable Transformer for Audio Spectrogram Processing

Following the successful application of vision transformers in multiple ...
research
07/19/2023

Improving Domain Generalization for Sound Classification with Sparse Frequency-Regularized Transformer

Sound classification models' performance suffers from generalizing on ou...
research
11/25/2022

Adaptive Attention Link-based Regularization for Vision Transformers

Although transformer networks are recently employed in various vision ta...
research
09/19/2023

MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group Settings

Bodily behavioral language is an important social cue, and its automated...

Please sign up or login with your details

Forgot password? Click here to reset