The Robustness Limits of SoTA Vision Models to Natural Variation

10/24/2022
by   Mark Ibrahim, et al.
0

Recent state-of-the-art vision models introduced new architectures, learning paradigms, and larger pretraining data, leading to impressive performance on tasks such as classification. While previous generations of vision models were shown to lack robustness to factors such as pose, it's unclear the extent to which this next generation of models are more robust. To study this question, we develop a dataset of more than 7 million images with controlled changes in pose, position, background, lighting, and size. We study not only how robust recent state-of-the-art models are, but also the extent to which models can generalize variation in factors when they're present during training. We consider a catalog of recent vision models, including vision transformers (ViT), self-supervised models such as masked autoencoders (MAE), and models trained on larger datasets such as CLIP. We find out-of-the-box, even today's best models are not robust to common changes in pose, size, and background. When some samples varied during training, we found models required a significant portion of diversity to generalize – though eventually robustness did improve. When diversity is only seen for some classes however, we found models did not generalize to other classes, unless the classes were very similar to those seen varying during training. We hope our work will shed further light on the blind spots of SoTA models and spur the development of more robust vision models.

READ FULL TEXT

page 18

page 19

page 20

page 21

page 22

page 23

page 24

page 25

research
11/03/2022

ImageNet-X: Understanding Model Mistakes with Factor of Variation Annotations

Deep learning vision systems are widely deployed across applications whe...
research
03/17/2022

Are Vision Transformers Robust to Spurious Correlations?

Deep neural networks may be susceptible to learning spurious correlation...
research
04/17/2023

Towards Robust Prompts on Vision-Language Models

With the advent of vision-language models (VLMs) that can perform in-con...
research
10/24/2022

Robust Self-Supervised Learning with Lie Groups

Deep learning has led to remarkable advances in computer vision. Even so...
research
01/25/2023

Out of Distribution Performance of State of Art Vision Model

The vision transformer (ViT) has advanced to the cutting edge in the vis...
research
03/16/2023

Investigating Failures to Generalize for Coreference Resolution Models

Coreference resolution models are often evaluated on multiple datasets. ...

Please sign up or login with your details

Forgot password? Click here to reset