AttEntropy: Segmenting Unknown Objects in Complex Scenes using the Spatial Attention Entropy of Semantic Segmentation Transformers

12/29/2022
by   Krzysztof Lis, et al.
0

Vision transformers have emerged as powerful tools for many computer vision tasks. It has been shown that their features and class tokens can be used for salient object segmentation. However, the properties of segmentation transformers remain largely unstudied. In this work we conduct an in-depth study of the spatial attentions of different backbone layers of semantic segmentation transformers and uncover interesting properties. The spatial attentions of a patch intersecting with an object tend to concentrate within the object, whereas the attentions of larger, more uniform image areas rather follow a diffusive behavior. In other words, vision transformers trained to segment a fixed set of object classes generalize to objects well beyond this set. We exploit this by extracting heatmaps that can be used to segment unknown objects within diverse backgrounds, such as obstacles in traffic scenes. Our method is training-free and its computational overhead negligible. We use off-the-shelf transformers trained for street-scene segmentation to process other scene types.

READ FULL TEXT

page 1

page 4

page 6

page 7

page 8

page 11

research
05/05/2023

Semantic Segmentation using Vision Transformers: A survey

Semantic segmentation has a broad range of applications in a variety of ...
research
12/15/2022

Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation

Transformers have proved to be very effective for visual recognition tas...
research
04/12/2023

SAM Struggles in Concealed Scenes – Empirical Study on "Segment Anything"

Segmenting anything is a ground-breaking step toward artificial general ...
research
10/25/2022

Learning Explicit Object-Centric Representations with Vision Transformers

With the recent successful adaptation of transformers to the vision doma...
research
12/27/2022

Interactive Segmentation of Radiance Fields

Radiance Fields (RF) are popular to represent casually-captured scenes f...
research
12/13/2022

What do Vision Transformers Learn? A Visual Exploration

Vision transformers (ViTs) are quickly becoming the de-facto architectur...
research
06/09/2022

Spatial Entropy Regularization for Vision Transformers

Recent work has shown that the attention maps of Vision Transformers (VT...

Please sign up or login with your details

Forgot password? Click here to reset