Revealing Model Biases: Assessing Deep Neural Networks via Recovered Sample Analysis

06/10/2023
by   Mohammad Mahdi Mehmanchi, et al.
2

This paper proposes a straightforward and cost-effective approach to assess whether a deep neural network (DNN) relies on the primary concepts of training samples or simply learns discriminative, yet simple and irrelevant features that can differentiate between classes. The paper highlights that DNNs, as discriminative classifiers, often find the simplest features to discriminate between classes, leading to a potential bias towards irrelevant features and sometimes missing generalization. While a generalization test is one way to evaluate a trained model's performance, it can be costly and may not cover all scenarios to ensure that the model has learned the primary concepts. Furthermore, even after conducting a generalization test, identifying bias in the model may not be possible. Here, the paper proposes a method that involves recovering samples from the parameters of the trained model and analyzing the reconstruction quality. We believe that if the model's weights are optimized to discriminate based on some features, these features will be reflected in the reconstructed samples. If the recovered samples contain the primary concepts of the training data, it can be concluded that the model has learned the essential and determining features. On the other hand, if the recovered samples contain irrelevant features, it can be concluded that the model is biased towards these features. The proposed method does not require any test or generalization samples, only the parameters of the trained model and the training data that lie on the margin. Our experiments demonstrate that the proposed method can determine whether the model has learned the desired features of the training data. The paper highlights that our understanding of how these models work is limited, and the proposed approach addresses this issue.

READ FULL TEXT

page 5

page 7

page 8

page 12

research
05/06/2023

Echoes: Unsupervised Debiasing via Pseudo-bias Labeling in an Echo Chamber

Neural networks often learn spurious correlations when exposed to biased...
research
06/15/2022

Reconstructing Training Data from Trained Neural Networks

Understanding to what extent neural networks memorize training data is a...
research
08/22/2020

Self-Competitive Neural Networks

Deep Neural Networks (DNNs) have improved the accuracy of classification...
research
03/02/2021

EnD: Entangling and Disentangling deep representations for bias correction

Artificial neural networks perform state-of-the-art in an ever-growing n...
research
05/05/2023

Reconstructing Training Data from Multiclass Neural Networks

Reconstructing samples from the training set of trained neural networks ...
research
12/07/2021

Defending against Model Stealing via Verifying Embedded External Features

Obtaining a well-trained model involves expensive data collection and tr...
research
10/28/2022

LegoNet: A Fast and Exact Unlearning Architecture

Machine unlearning aims to erase the impact of specific training samples...

Please sign up or login with your details

Forgot password? Click here to reset