The Weighted Möbius Score: A Unified Framework for Feature Attribution

05/16/2023
by   Yifan Jiang, et al.
0

Feature attribution aims to explain the reasoning behind a black-box model's prediction by identifying the impact of each feature on the prediction. Recent work has extended feature attribution to interactions between multiple features. However, the lack of a unified framework has led to a proliferation of methods that are often not directly comparable. This paper introduces a parameterized attribution framework – the Weighted Möbius Score – and (i) shows that many different attribution methods for both individual features and feature interactions are special cases and (ii) identifies some new methods. By studying the vector space of attribution methods, our framework utilizes standard linear algebra tools and provides interpretations in various fields, including cooperative game theory and causal mediation analysis. We empirically demonstrate the framework's versatility and effectiveness by applying these attribution methods to feature interactions in sentiment analysis and chain-of-thought prompting.

READ FULL TEXT
research
03/02/2023

Understanding and Unifying Fourteen Attribution Methods with Taylor Interactions

Various attribution methods have been developed to explain deep neural n...
research
06/19/2020

How does this interaction affect me? Interpretable attribution for feature interactions

Machine learning transparency calls for interpretable explanations of ho...
research
05/30/2018

How Important Is a Neuron?

The problem of attributing a deep network's prediction to its input/base...
research
08/21/2020

A Unified Taylor Framework for Revisiting Attribution Methods

Attribution methods have been developed to understand the decision makin...
research
12/03/2020

Bidding Through the Lens of Attribution: Pick the Right Labels!

Attribution-the mechanism that assigns conversion credits to individual ...
research
10/11/2021

You Mostly Walk Alone: Analyzing Feature Attribution in Trajectory Prediction

Predicting the future trajectory of a moving agent can be easy when the ...
research
09/02/2021

Integrated Directional Gradients: Feature Interaction Attribution for Neural NLP Models

In this paper, we introduce Integrated Directional Gradients (IDG), a me...

Please sign up or login with your details

Forgot password? Click here to reset