Asymmetric feature interaction for interpreting model predictions

05/12/2023
by   Xiaolei Lu, et al.
0

In natural language processing (NLP), deep neural networks (DNNs) could model complex interactions between context and have achieved impressive results on a range of NLP tasks. Prior works on feature interaction attribution mainly focus on studying symmetric interaction that only explains the additional influence of a set of words in combination, which fails to capture asymmetric influence that contributes to model prediction. In this work, we propose an asymmetric feature interaction attribution explanation model that aims to explore asymmetric higher-order feature interactions in the inference of deep neural NLP models. By representing our explanation with an directed interaction graph, we experimentally demonstrate interpretability of the graph to discover asymmetric feature interactions. Experimental results on two sentiment classification datasets show the superiority of our model against the state-of-the-art feature interaction attribution methods in identifying influential features for model predictions. Our code is available at https://github.com/StillLu/ASIV.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2020

How does this interaction affect me? Interpretable attribution for feature interactions

Machine learning transparency calls for interpretable explanations of ho...
research
12/27/2020

Inserting Information Bottlenecks for Attribution in Transformers

Pretrained transformers achieve the state of the art across tasks in nat...
research
04/22/2022

Locally Aggregated Feature Attribution on Natural Language Model Understanding

With the growing popularity of deep-learning models, model understanding...
research
06/21/2023

Feature Interactions Reveal Linguistic Structure in Language Models

We study feature interactions in the context of feature attribution meth...
research
10/14/2021

The Irrationality of Neural Rationale Models

Neural rationale models are popular for interpretable predictions of NLP...
research
09/02/2021

Integrated Directional Gradients: Feature Interaction Attribution for Neural NLP Models

In this paper, we introduce Integrated Directional Gradients (IDG), a me...
research
10/07/2021

Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates

Among the most critical limitations of deep learning NLP models are thei...

Please sign up or login with your details

Forgot password? Click here to reset