Are Transformers More Robust Than CNNs?

11/10/2021
by   Yutong Bai, et al.
0

Transformer emerges as a powerful tool for visual recognition. In addition to demonstrating competitive performance on a broad range of visual benchmarks, recent works also argue that Transformers are much more robust than Convolutions Neural Networks (CNNs). Nonetheless, surprisingly, we find these conclusions are drawn from unfair experimental settings, where Transformers and CNNs are compared at different scales and are applied with distinct training frameworks. In this paper, we aim to provide the first fair in-depth comparisons between Transformers and CNNs, focusing on robustness evaluations. With our unified training setup, we first challenge the previous belief that Transformers outshine CNNs when measuring adversarial robustness. More surprisingly, we find CNNs can easily be as robust as Transformers on defending against adversarial attacks, if they properly adopt Transformers' training recipes. While regarding generalization on out-of-distribution samples, we show pre-training on (external) large-scale datasets is not a fundamental request for enabling Transformers to achieve better performance than CNNs. Moreover, our ablations suggest such stronger generalization is largely benefited by the Transformer's self-attention-like architectures per se, rather than by other training setups. We hope this work can help the community better understand and benchmark the robustness of Transformers and CNNs. The code and models are publicly available at https://github.com/ytongbai/ViTs-vs-CNNs.

READ FULL TEXT
research
06/07/2022

Can CNNs Be More Robust Than Transformers?

The recent success of Vision Transformers is shaking the long dominance ...
research
10/14/2022

When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture

Vision Transformers (ViTs) have recently achieved competitive performanc...
research
03/03/2023

Data-Efficient Training of CNNs and Transformers with Coresets: A Stability Perspective

Coreset selection is among the most effective ways to reduce the trainin...
research
08/01/2022

Understanding Adversarial Robustness of Vision Transformers via Cauchy Problem

Recent research on the robustness of deep learning has shown that Vision...
research
05/17/2021

Vision Transformers are Robust Learners

Transformers, composed of multiple self-attention layers, hold strong pr...
research
10/11/2022

Curved Representation Space of Vision Transformers

Neural networks with self-attention (a.k.a. Transformers) like ViT and S...
research
07/20/2022

On the Versatile Uses of Partial Distance Correlation in Deep Learning

Comparing the functional behavior of neural network models, whether it i...

Please sign up or login with your details

Forgot password? Click here to reset