From Hope to Safety: Unlearning Biases of Deep Models by Enforcing the Right Reasons in Latent Space

08/18/2023
by   Maximilian Dreyer, et al.
0

Deep Neural Networks are prone to learning spurious correlations embedded in the training data, leading to potentially biased predictions. This poses risks when deploying these models for high-stake decision-making, such as in medical applications. Current methods for post-hoc model correction either require input-level annotations, which are only possible for spatially localized biases, or augment the latent feature space, thereby hoping to enforce the right reasons. We present a novel method ensuring the right reasons on the concept level by reducing the model's sensitivity towards biases through the gradient. When modeling biases via Concept Activation Vectors, we highlight the importance of choosing robust directions, as traditional regression-based approaches such as Support Vector Machines tend to result in diverging directions. We effectively mitigate biases in controlled and real-world settings on the ISIC, Bone Age, ImageNet and CelebA datasets using VGG, ResNet and EfficientNet architectures.

READ FULL TEXT

page 15

page 16

page 22

page 23

page 24

page 25

page 26

page 27

research
03/22/2023

Reveal to Revise: An Explainable AI Life Cycle for Iterative Bias Correction of Deep Models

State-of-the-art machine learning models often learn spurious correlatio...
research
02/07/2022

PatClArC: Using Pattern Concept Activation Vectors for Noise-Robust Model Debugging

State-of-the-art machine learning models are commonly (pre-)trained on l...
research
07/13/2023

Uncovering Unique Concept Vectors through Latent Space Decomposition

Interpreting the inner workings of deep learning models is crucial for e...
research
05/25/2022

Less Learn Shortcut: Analyzing and Mitigating Learning of Spurious Feature-Label Correlation

Many recent works indicate that the deep neural networks tend to take da...
research
05/31/2023

Signal Is Harder To Learn Than Bias: Debiasing with Focal Loss

Spurious correlations are everywhere. While humans often do not perceive...
research
10/14/2021

Making Document-Level Information Extraction Right for the Right Reasons

Document-level information extraction is a flexible framework compatible...
research
10/07/2020

Why do you think that? Exploring Faithful Sentence-Level Rationales Without Supervision

Evaluating the trustworthiness of a model's prediction is essential for ...

Please sign up or login with your details

Forgot password? Click here to reset