Number of Attention Heads vs Number of Transformer-Encoders in Computer Vision

09/15/2022
by   Tomas Hrycej, et al.
14

Determining an appropriate number of attention heads on one hand and the number of transformer-encoders, on the other hand, is an important choice for Computer Vision (CV) tasks using the Transformer architecture. Computing experiments confirmed the expectation that the total number of parameters has to satisfy the condition of overdetermination (i.e., number of constraints significantly exceeding the number of parameters). Then, good generalization performance can be expected. This sets the boundaries within which the number of heads and the number of transformers can be chosen. If the role of context in images to be classified can be assumed to be small, it is favorable to use multiple transformers with a low number of heads (such as one or two). In classifying objects whose class may heavily depend on the context within the image (i.e., the meaning of a patch being dependent on other patches), the number of heads is equally important as that of transformers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2023

Efficiency 360: Efficient Vision Transformers

Transformers are widely used for solving tasks in natural language proce...
research
04/14/2022

Residual Swin Transformer Channel Attention Network for Image Demosaicing

Image demosaicing is problem of interpolating full- resolution color ima...
research
12/05/2022

Solving the Weather4cast Challenge via Visual Transformers for 3D Images

Accurately forecasting the weather is an important task, as many real-wo...
research
10/11/2021

Investigating Transfer Learning Capabilities of Vision Transformers and CNNs by Fine-Tuning a Single Trainable Block

In recent developments in the field of Computer Vision, a rise is seen i...
research
09/11/2023

SparseSwin: Swin Transformer with Sparse Transformer Block

Advancements in computer vision research have put transformer architectu...
research
10/30/2021

PatchFormer: A Versatile 3D Transformer Based on Patch Attention

The 3D vision community is witnesses a modeling shift from CNNs to Trans...
research
01/25/2023

Tighter Bounds on the Expressivity of Transformer Encoders

Characterizing neural networks in terms of better-understood formal syst...

Please sign up or login with your details

Forgot password? Click here to reset