Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates

10/07/2021
by   Xiaochuang Han, et al.
0

Among the most critical limitations of deep learning NLP models are their lack of interpretability, and their reliance on spurious correlations. Prior work proposed various approaches to interpreting the black-box models to unveil the spurious correlations, but the research was primarily used in human-computer interaction scenarios. It still remains underexplored whether or how such model interpretations can be used to automatically "unlearn" confounding features. In this work, we propose influence tuning–a procedure that leverages model interpretations to update the model parameters towards a plausible interpretation (rather than an interpretation that relies on spurious patterns in the data) in addition to learning to predict the task labels. We show that in a controlled setup, influence tuning can help deconfounding the model from spurious patterns in data, significantly outperforming baseline methods that use adversarial training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/19/2021

Interpretable Deep Learning: Interpretations, Interpretability, Trustworthiness, and Beyond

Deep neural networks have been well-known for their superb performance i...
research
05/14/2020

Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions

Modern deep learning models for NLP are notoriously opaque. This has mot...
research
05/29/2021

Understanding Instance-based Interpretability of Variational Auto-Encoders

Instance-based interpretation methods have been widely studied for super...
research
03/29/2019

Interpreting Black Box Models with Statistical Guarantees

While many methods for interpreting machine learning models have been pr...
research
05/12/2023

Asymmetric feature interaction for interpreting model predictions

In natural language processing (NLP), deep neural networks (DNNs) could ...
research
08/21/2023

Spurious Correlations and Where to Find Them

Spurious correlations occur when a model learns unreliable features from...
research
10/31/2022

Consistent and Truthful Interpretation with Fourier Analysis

For many interdisciplinary fields, ML interpretations need to be consist...

Please sign up or login with your details

Forgot password? Click here to reset