Out of Distribution Performance of State of Art Vision Model

01/25/2023

∙

The vision transformer (ViT) has advanced to the cutting edge in the visual recognition task. Transformers are more robust than CNN, according to the latest research. ViT's self-attention mechanism, according to the claim, makes it more robust than CNN. Even with this, we discover that these conclusions are based on unfair experimental conditions and just comparing a few models, which did not allow us to depict the entire scenario of robustness performance. In this study, we investigate the performance of 58 state-of-the-art computer vision models in a unified training setup based not only on attention and convolution mechanisms but also on neural networks based on a combination of convolution and attention mechanisms, sequence-based model, complementary search, and network-based method. Our research demonstrates that robustness depends on the training setup and model types, and performance varies based on out-of-distribution type. Our research will aid the community in better understanding and benchmarking the robustness of computer vision models.

READ FULL TEXT

Out of Distribution Performance of State of Art Vision Model

A survey of the Vision Transformers and its CNN-Transformer based Variants

MNIST-C: A Robustness Benchmark for Computer Vision

An Impartial Take to the CNN vs Transformer Robustness Contest

Are Vision Transformers Robust to Spurious Correlations?

Blending Anti-Aliasing into Vision Transformer

Vision Transformers are Robust Learners

The Robustness Limits of SoTA Vision Models to Natural Variation

Out of Distribution Performance of State of Art Vision Model

Related Research

A survey of the Vision Transformers and its CNN-Transformer based Variants

MNIST-C: A Robustness Benchmark for Computer Vision

An Impartial Take to the CNN vs Transformer Robustness Contest

Are Vision Transformers Robust to Spurious Correlations?

Blending Anti-Aliasing into Vision Transformer

Vision Transformers are Robust Learners

The Robustness Limits of SoTA Vision Models to Natural Variation