Analyzing ImageNet with Spectral Relevance Analysis: Towards ImageNet un-Hans'ed

12/22/2019
by   Christopher J. Anders, et al.
41

Today's machine learning models for computer vision are typically trained on very large (benchmark) data sets with millions of samples. These may, however, contain biases, artifacts, or errors that have gone unnoticed and are exploited by the model. In the worst case, the trained model may become a 'Clever Hans' predictor that does not learn a valid and generalizable strategy to solve the problem it was trained for, but bases its decisions on spurious correlations in the training data. Recently developed techniques allow to explain individual model decisions and thus to gain deeper insights into the model's prediction strategies. In this paper, we contribute by providing a comprehensive analysis framework based on a scalable statistical analysis of attributions from explanation methods for large data corpora, here ImageNet. Based on a recent technique - Spectral Relevance Analysis (SpRAy) - we propose three technical contributions and resulting findings: (a) novel similarity metrics based on Wasserstein for comparing attributions to allow for the first time scale, translational, and rotational invariant comparisons of attributions, (b) a scalable quantification of artifactual and poisoned classes where the ML models under study exhibit Clever Hans behavior, (c) a cleaning procedure that allows to relief data of artifacts and biases in a systematic manner yielding significantly reduced Clever Hans behavior, i.e. we un-Hans the ImageNet data corpus. Using this novel method set, we provide qualitative and quantitative analyses of the biases and artifacts in ImageNet and demonstrate that the usage of these insights can give rise to improved models and functionally cleaned data corpora.

READ FULL TEXT

page 2

page 5

page 7

page 8

page 9

page 10

page 15

page 16

research
03/14/2023

Variation of Gender Biases in Visual Recognition Models Before and After Finetuning

We introduce a framework to measure how biases change before and after f...
research
05/03/2019

Auditing ImageNet: Towards a Model-driven Framework for Annotating Demographic Attributes of Large-Scale Image Datasets

The ImageNet dataset ushered in a flood of academic and industry interes...
research
09/11/2017

Why Do Deep Neural Networks Still Not Recognize These Images?: A Qualitative Analysis on Failure Cases of ImageNet Classification

In a recent decade, ImageNet has become the most notable and powerful be...
research
03/09/2023

Mark My Words: Dangers of Watermarked Images in ImageNet

The utilization of pre-trained networks, especially those trained on Ima...
research
08/03/2023

A Multidimensional Analysis of Social Biases in Vision Transformers

The embedding spaces of image models have been shown to encode a range o...
research
05/09/2022

When does dough become a bagel? Analyzing the remaining mistakes on ImageNet

Image classification accuracy on the ImageNet dataset has been a baromet...
research
09/29/2019

Query-Specific Knowledge Summarization with Entity Evolutionary Networks

Given a query, unlike traditional IR that finds relevant documents or en...

Please sign up or login with your details

Forgot password? Click here to reset