Spatial Entropy Regularization for Vision Transformers

06/09/2022
by   Elia Peruzzo, et al.
21

Recent work has shown that the attention maps of Vision Transformers (VTs), when trained with self-supervision, can contain a semantic segmentation structure which does not spontaneously emerge when training is supervised. In this paper, we explicitly encourage the emergence of this spatial clustering as a form of training regularization, this way including a self-supervised pretext task into the standard supervised learning. In more detail, we propose a VT regularization method based on a spatial formulation of the information entropy. By minimizing the proposed spatial entropy, we explicitly ask the VT to produce spatially ordered attention maps, this way including an object-based prior during training. Using extensive experiments, we show that the proposed regularization approach is beneficial with different training scenarios, datasets, downstream tasks and VT architectures. The code will be available upon acceptance.

READ FULL TEXT

page 1

page 5

page 9

page 17

page 18

page 19

research
09/13/2023

Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit?

Convolutional networks and vision transformers have different forms of p...
research
10/27/2022

PatchRot: A Self-Supervised Technique for Training Vision Transformers

Vision transformers require a huge amount of labeled data to outperform ...
research
12/07/2022

Teaching Matters: Investigating the Role of Supervision in Vision Transformers

Vision Transformers (ViTs) have gained significant popularity in recent ...
research
10/16/2022

Learning Self-Regularized Adversarial Views for Self-Supervised Vision Transformers

Automatic data augmentation (AutoAugment) strategies are indispensable i...
research
12/29/2022

AttEntropy: Segmenting Unknown Objects in Complex Scenes using the Spatial Attention Entropy of Semantic Segmentation Transformers

Vision transformers have emerged as powerful tools for many computer vis...
research
06/01/2023

Affinity-based Attention in Self-supervised Transformers Predicts Dynamics of Object Grouping in Humans

The spreading of attention has been proposed as a mechanism for how huma...
research
10/06/2020

Guiding Attention for Self-Supervised Learning with Transformers

In this paper, we propose a simple and effective technique to allow for ...

Please sign up or login with your details

Forgot password? Click here to reset