ImageNet-X: Understanding Model Mistakes with Factor of Variation Annotations

11/03/2022
by   Badr Youbi Idrissi, et al.
2

Deep learning vision systems are widely deployed across applications where reliability is critical. However, even today's best models can fail to recognize an object when its pose, lighting, or background varies. While existing benchmarks surface examples challenging for models, they do not explain why such mistakes arise. To address this need, we introduce ImageNet-X, a set of sixteen human annotations of factors such as pose, background, or lighting the entire ImageNet-1k validation set as well as a random subset of 12k training images. Equipped with ImageNet-X, we investigate 2,200 current recognition models and study the types of mistakes as a function of model's (1) architecture, e.g. transformer vs. convolutional, (2) learning paradigm, e.g. supervised vs. self-supervised, and (3) training procedures, e.g., data augmentation. Regardless of these choices, we find models have consistent failure modes across ImageNet-X categories. We also find that while data augmentation can improve robustness to certain factors, they induce spill-over effects to other factors. For example, strong random cropping hurts robustness on smaller objects. Together, these insights suggest to advance the robustness of modern vision models, future research should focus on collecting additional data and understanding data augmentation schemes. Along with these insights, we release a toolkit based on ImageNet-X to spur further study into the mistakes image recognition systems make.

READ FULL TEXT

page 3

page 18

research
03/23/2021

Leveraging background augmentations to encourage semantic focus in self-supervised contrastive learning

Unsupervised representation learning is an important challenge in comput...
research
04/11/2023

Pinpointing Why Object Recognition Performance Degrades Across Income Levels and Geographies

Despite impressive advances in object-recognition, deep learning systems...
research
10/24/2022

The Robustness Limits of SoTA Vision Models to Natural Variation

Recent state-of-the-art vision models introduced new architectures, lear...
research
10/14/2021

Nuisance-Label Supervision: Robustness Improvement by Free Labels

In this paper, we present a Nuisance-label Supervision (NLS) module, whi...
research
12/09/2022

A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others

Machine learning models have been found to learn shortcuts – unintended ...
research
06/12/2020

Are we done with ImageNet?

Yes, and no. We ask whether recent progress on the ImageNet classificati...
research
11/05/2022

Local Manifold Augmentation for Multiview Semantic Consistency

Multiview self-supervised representation learning roots in exploring sem...

Please sign up or login with your details

Forgot password? Click here to reset