Are Vision Transformers Robust to Patch Perturbations?

11/20/2021
by   Jindong Gu, et al.
16

The recent advances in Vision Transformer (ViT) have demonstrated its impressive performance in image classification, which makes it a promising alternative to Convolutional Neural Network (CNN). Unlike CNNs, ViT represents an input image as a sequence of image patches. The patch-wise input image representation makes the following question interesting: How does ViT perform when individual input image patches are perturbed with natural corruptions or adversarial perturbations, compared to CNNs? In this work, we study the robustness of vision transformers to patch-wise perturbations. Surprisingly, we find that vision transformers are more robust to naturally corrupted patches than CNNs, whereas they are more vulnerable to adversarial patches. Furthermore, we conduct extensive qualitative and quantitative experiments to understand the robustness to patch perturbations. We have revealed that ViT's stronger robustness to natural corrupted patches and higher vulnerability against adversarial patches are both caused by the attention mechanism. Specifically, the attention model can help improve the robustness of vision transformers by effectively ignoring natural corrupted patches. However, when vision transformers are attacked by an adversary, the attention mechanism can be easily fooled to focus more on the adversarially perturbed patches and cause a mistake.

READ FULL TEXT

page 2

page 6

page 7

page 8

research
03/16/2022

Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations?

Vision transformers (ViTs) have recently set off a new wave in neural ar...
research
11/02/2022

The Lottery Ticket Hypothesis for Vision Transformers

The conventional lottery ticket hypothesis (LTH) claims that there exist...
research
07/09/2023

Random Position Adversarial Patch for Vision Transformers

Previous studies have shown the vulnerability of vision transformers to ...
research
03/07/2022

ImageNet-Patch: A Dataset for Benchmarking Machine Learning Robustness against Adversarial Patches

Adversarial patches are optimized contiguous pixel blocks in an input im...
research
10/15/2021

Understanding and Improving Robustness of Vision Transformers through Patch-based Negative Augmentation

We investigate the robustness of vision transformers (ViTs) through the ...
research
12/07/2021

Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal

Vision transformers (ViTs) have demonstrated impressive performance and ...
research
03/22/2022

GradViT: Gradient Inversion of Vision Transformers

In this work we demonstrate the vulnerability of vision transformers (Vi...

Please sign up or login with your details

Forgot password? Click here to reset