DORA: Exploring outlier representations in Deep Neural Networks

06/09/2022
by   Kirill Bykov, et al.
27

Deep Neural Networks (DNNs) draw their power from the representations they learn. In recent years, however, researchers have found that DNNs, while being incredibly effective in learning complex abstractions, also tend to be infected with artifacts, such as biases, Clever Hanses (CH), or Backdoors, due to spurious correlations inherent in the training data. So far, existing methods for uncovering such artifactual and malicious behavior in trained models focus on finding artifacts in the input data, which requires both availabilities of a data set and human intervention. In this paper, we introduce DORA (Data-agnOstic Representation Analysis): the first automatic data-agnostic method for the detection of potentially infected representations in Deep Neural Networks. We further show that contaminated representations found by DORA can be used to detect infected samples in any given dataset. We qualitatively and quantitatively evaluate the performance of our proposed method in both, controlled toy scenarios, and in real-world settings, where we demonstrate the benefit of DORA in safety-critical applications.

READ FULL TEXT

page 5

page 8

page 15

page 16

page 17

page 18

page 19

page 20

research
01/31/2023

Interpreting Robustness Proofs of Deep Neural Networks

In recent years numerous methods have been developed to formally verify ...
research
03/29/2021

Performance Analysis of Out-of-Distribution Detection on Various Trained Neural Networks

Several areas have been improved with Deep Learning during the past year...
research
03/18/2023

Learn, Unlearn and Relearn: An Online Learning Paradigm for Deep Neural Networks

Deep neural networks (DNNs) are often trained on the premise that the co...
research
01/07/2020

PaRoT: A Practical Framework for Robust Deep Neural Network Training

Deep Neural Networks (DNNs) are finding important applications in safety...
research
01/07/2020

PaRoT: A Practical Framework for Robust Deep NeuralNetwork Training

Deep Neural Networks (DNNs) are finding important applications in safety...
research
07/13/2020

Exclusion and Inclusion – A model agnostic approach to feature importance in DNNs

Deep Neural Networks in NLP have enabled systems to learn complex non-li...
research
06/27/2022

Monitoring Shortcut Learning using Mutual Information

The failure of deep neural networks to generalize to out-of-distribution...

Please sign up or login with your details

Forgot password? Click here to reset