Understanding and Improving Robustness of Vision Transformers through Patch-based Negative Augmentation

10/15/2021
by   Yao Qin, et al.
27

We investigate the robustness of vision transformers (ViTs) through the lens of their special patch-based architectural structure, i.e., they process an image as a sequence of image patches. We find that ViTs are surprisingly insensitive to patch-based transformations, even when the transformation largely destroys the original semantics and makes the image unrecognizable by humans. This indicates that ViTs heavily use features that survived such transformations but are generally not indicative of the semantic class to humans. Further investigations show that these features are useful but non-robust, as ViTs trained on them can achieve high in-distribution accuracy, but break down under distribution shifts. From this understanding, we ask: can training the model to rely less on these features improve ViT robustness and out-of-distribution performance? We use the images transformed with our patch-based operations as negatively augmented views and offer losses to regularize the training away from using non-robust features. This is a complementary view to existing research that mostly focuses on augmenting inputs with semantic-preserving transformations to enforce models' invariance. We show that patch-based negative augmentation consistently improves robustness of ViTs across a wide set of ImageNet based robustness benchmarks. Furthermore, we find our patch-based negative augmentation are complementary to traditional (positive) data augmentation, and together boost the performance further. All the code in this work will be open-sourced.

READ FULL TEXT

page 2

page 19

page 20

page 21

research
06/06/2019

Improving Robustness Without Sacrificing Accuracy with Patch Gaussian Augmentation

Deploying machine learning systems in the real world requires both high ...
research
11/20/2021

Are Vision Transformers Robust to Patch Perturbations?

The recent advances in Vision Transformer (ViT) have demonstrated its im...
research
07/09/2023

Random Position Adversarial Patch for Vision Transformers

Previous studies have shown the vulnerability of vision transformers to ...
research
12/09/2022

AugNet: Dynamic Test-Time Augmentation via Differentiable Functions

Distribution shifts, which often occur in the real world, degrade the ac...
research
04/26/2022

Deeper Insights into ViTs Robustness towards Common Corruptions

Recent literature have shown design strategies from Convolutions Neural ...
research
11/16/2021

Improved Robustness of Vision Transformer via PreLayerNorm in Patch Embedding

Vision transformers (ViTs) have recently demonstrated state-of-the-art p...
research
03/28/2019

Model Vulnerability to Distributional Shifts over Image Transformation Sets

We are concerned with the vulnerability of computer vision models to dis...

Please sign up or login with your details

Forgot password? Click here to reset