Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations?

03/16/2022
by   Yonggan Fu, et al.
5

Vision transformers (ViTs) have recently set off a new wave in neural architecture design thanks to their record-breaking performance in various vision tasks. In parallel, to fulfill the goal of deploying ViTs into real-world vision applications, their robustness against potential malicious attacks has gained increasing attention. In particular, recent works show that ViTs are more robust against adversarial attacks as compared with convolutional neural networks (CNNs), and conjecture that this is because ViTs focus more on capturing global interactions among different input/feature patches, leading to their improved robustness to local perturbations imposed by adversarial attacks. In this work, we ask an intriguing question: "Under what kinds of perturbations do ViTs become more vulnerable learners compared to CNNs?" Driven by this question, we first conduct a comprehensive experiment regarding the robustness of both ViTs and CNNs under various existing adversarial attacks to understand the underlying reason favoring their robustness. Based on the drawn insights, we then propose a dedicated attack framework, dubbed Patch-Fool, that fools the self-attention mechanism by attacking its basic component (i.e., a single patch) with a series of attention-aware optimization techniques. Interestingly, our Patch-Fool framework shows for the first time that ViTs are not necessarily more robust than CNNs against adversarial perturbations. In particular, we find that ViTs are more vulnerable learners compared with CNNs against our Patch-Fool attack which is consistent across extensive experiments, and the observations from Sparse/Mild Patch-Fool, two variants of Patch-Fool, indicate an intriguing insight that the perturbation density and strength on each patch seem to be the key factors that influence the robustness ranking between ViTs and CNNs.

READ FULL TEXT

page 3

page 17

page 18

research
11/20/2021

Are Vision Transformers Robust to Patch Perturbations?

The recent advances in Vision Transformer (ViT) have demonstrated its im...
research
10/06/2021

Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs

Convolutional Neural Networks (CNNs) have become the de facto gold stand...
research
08/20/2022

Analyzing Adversarial Robustness of Vision Transformers against Spatial and Spectral Attacks

Vision Transformers have emerged as a powerful architecture that can out...
research
08/01/2022

Understanding Adversarial Robustness of Vision Transformers via Cauchy Problem

Recent research on the robustness of deep learning has shown that Vision...
research
01/31/2023

Inference Time Evidences of Adversarial Attacks for Forensic on Transformers

Vision Transformers (ViTs) are becoming a very popular paradigm for visi...
research
08/27/2022

TrojViT: Trojan Insertion in Vision Transformers

Vision Transformers (ViTs) have demonstrated the state-of-the-art perfor...
research
01/20/2022

Steerable Pyramid Transform Enables Robust Left Ventricle Quantification

Although multifarious variants of convolutional neural networks (CNNs) h...

Please sign up or login with your details

Forgot password? Click here to reset