Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit?

09/13/2023
by   Bill Psomas, et al.
0

Convolutional networks and vision transformers have different forms of pairwise interactions, pooling across layers and pooling at the end of the network. Does the latter really need to be different? As a by-product of pooling, vision transformers provide spatial attention for free, but this is most often of low quality unless self-supervised, which is not well studied. Is supervision really the problem? In this work, we develop a generic pooling framework and then we formulate a number of existing methods as instantiations. By discussing the properties of each group of methods, we derive SimPool, a simple attention-based pooling mechanism as a replacement of the default one for both convolutional and transformer encoders. We find that, whether supervised or self-supervised, this improves performance on pre-training and downstream tasks and provides attention maps delineating object boundaries in all cases. One could thus call SimPool universal. To our knowledge, we are the first to obtain attention maps in supervised transformers of at least as good quality as self-supervised, without explicit losses or modifying the architecture. Code at: https://github.com/billpsomas/simpool.

READ FULL TEXT

page 24

page 25

page 26

page 27

page 28

page 29

page 30

page 31

research
05/10/2021

Self-Supervised Learning with Swin Transformers

We are witnessing a modeling shift from CNN to Transformers in computer ...
research
06/09/2022

Spatial Entropy Regularization for Vision Transformers

Recent work has shown that the attention maps of Vision Transformers (VT...
research
10/06/2020

Guiding Attention for Self-Supervised Learning with Transformers

In this paper, we propose a simple and effective technique to allow for ...
research
10/11/2021

Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning

Studies on self-supervised visual representation learning (SSL) improve ...
research
10/16/2022

Learning Self-Regularized Adversarial Views for Self-Supervised Vision Transformers

Automatic data augmentation (AutoAugment) strategies are indispensable i...
research
08/30/2023

Emergence of Segmentation with Minimalistic White-Box Transformers

Transformer-like models for vision tasks have recently proven effective ...
research
11/17/2022

Efficient Transformers with Dynamic Token Pooling

Transformers achieve unrivalled performance in modelling language, but r...

Please sign up or login with your details

Forgot password? Click here to reset