Understanding The Robustness in Vision Transformers

04/26/2022
by   Daquan Zhou, et al.
13

Recent studies show that Vision Transformers(ViTs) exhibit strong robustness against various corruptions. Although this property is partly attributed to the self-attention mechanism, there is still a lack of systematic understanding. In this paper, we examine the role of self-attention in learning robust representations. Our study is motivated by the intriguing properties of the emerging visual grouping in Vision Transformers, which indicates that self-attention may promote robustness through improved mid-level representations. We further propose a family of fully attentional networks (FANs) that strengthen this capability by incorporating an attentional channel processing design. We validate the design comprehensively on various hierarchical backbones. Our model achieves a state of-the-art 87.1 and 35.8 demonstrate state-of-the-art accuracy and robustness in two downstream tasks: semantic segmentation and object detection. Code will be available at https://github.com/NVlabs/FAN.

READ FULL TEXT

page 1

page 4

page 9

page 16

page 17

research
12/23/2022

A Close Look at Spatial Modeling: From Attention to Convolution

Vision Transformers have shown great promise recently for many vision ta...
research
07/21/2022

Multi Resolution Analysis (MRA) for Approximate Self-Attention

Transformers have emerged as a preferred model for many tasks in natural...
research
05/22/2023

HGFormer: Hierarchical Grouping Transformer for Domain Generalized Semantic Segmentation

Current semantic segmentation models have achieved great success under t...
research
05/17/2021

Pay Attention to MLPs

Transformers have become one of the most important architectural innovat...
research
08/21/2023

Spatial Transform Decoupling for Oriented Object Detection

Vision Transformers (ViTs) have achieved remarkable success in computer ...
research
04/23/2022

Visual Attention Emerges from Recurrent Sparse Reconstruction

Visual attention helps achieve robust perception under noise, corruption...

Please sign up or login with your details

Forgot password? Click here to reset