EIT: Efficiently Lead Inductive Biases to ViT

03/14/2022
by   Rui Xia, et al.
0

Vision Transformer (ViT) depends on properties similar to the inductive bias inherent in Convolutional Neural Networks to perform better on non-ultra-large scale datasets. In this paper, we propose an architecture called Efficiently lead Inductive biases to ViT (EIT), which can effectively lead the inductive biases to both phases of ViT. In the Patches Projection phase, a convolutional max-pooling structure is used to produce overlapping patches. In the Transformer Encoder phase, we design a novel inductive bias introduction structure called decreasing convolution, which is introduced parallel to the multi-headed attention module, by which the embedding's different channels are processed respectively. In four popular small-scale datasets, compared with ViT, EIT has an accuracy improvement of 12.6 and FLOPs. Compared with ResNet, EIT exhibits higher accuracy with only 17.7 parameters and fewer FLOPs. Finally, ablation studies show that the EIT is efficient and does not require position embedding. Code is coming soon: https://github.com/MrHaiPi/EIT

READ FULL TEXT

page 5

page 8

research
10/13/2022

How to Train Vision Transformer on Small-scale Datasets?

Vision Transformer (ViT), a radically different architecture than convol...
research
10/12/2022

Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets

There still remains an extreme performance gap between Vision Transforme...
research
08/19/2023

Inductive-bias Learning: Generating Code Models with Large Language Model

Large Language Models(LLMs) have been attracting attention due to a abil...
research
10/04/2022

Towards Flexible Inductive Bias via Progressive Reparameterization Scheduling

There are two de facto standard architectures in recent computer vision:...
research
05/28/2021

On the Bias Against Inductive Biases

Borrowing from the transformer models that revolutionized the field of n...
research
11/19/2021

Rethinking Query, Key, and Value Embedding in Vision Transformer under Tiny Model Constraints

A vision transformer (ViT) is the dominant model in the computer vision ...
research
02/15/2021

Translational Equivariance in Kernelizable Attention

While Transformer architectures have show remarkable success, they are b...

Please sign up or login with your details

Forgot password? Click here to reset