A General Taylor Framework for Unifying and Revisiting Attribution Methods

by   Huiqi Deng, et al.

Attribution methods provide an insight into the decision-making process of machine learning models, especially deep neural networks, by assigning contribution scores to each individual feature. However, the attribution problem has not been well-defined, which lacks a unified guideline to the contribution assignment process. Furthermore, existing attribution methods often built upon various empirical intuitions and heuristics. There still lacks a general theoretical framework that not only can offer a good description of the attribution problem, but also can be applied to unifying and revisiting existing attribution methods. To bridge the gap, in this paper, we propose a Taylor attribution framework, which models the attribution problem as how to decide individual payoffs in a coalition. Then, we reformulate fourteen mainstream attribution methods into the Taylor framework and analyze these attribution methods in terms of rationale, fidelity, and limitation in the framework. Moreover, we establish three principles for a good attribution in the Taylor attribution framework, i.e., low approximation error, correct Taylor contribution assignment, and unbiased baseline selection. Finally, we empirically validate the Taylor reformulations and reveal a positive correlation between the attribution performance and the number of principles followed by the attribution method via benchmarking on real-world datasets.


page 1

page 2

page 3

page 4


A Unified Taylor Framework for Revisiting Attribution Methods

Attribution methods have been developed to understand the decision makin...

Understanding and Unifying Fourteen Attribution Methods with Taylor Interactions

Various attribution methods have been developed to explain deep neural n...

Towards Rigorous Interpretations: a Formalisation of Feature Attribution

Feature attribution is often loosely presented as the process of selecti...

Stability Guarantees for Feature Attributions with Multiplicative Smoothing

Explanation methods for machine learning models tend to not provide any ...

Improving Attribution Methods by Learning Submodular Functions

This work explores the novel idea of learning a submodular scoring funct...

On Attribution of Recurrent Neural Network Predictions via Additive Decomposition

RNN models have achieved the state-of-the-art performance in a wide rang...

Towards credible visual model interpretation with path attribution

Originally inspired by game-theory, path attribution framework stands ou...

Please sign up or login with your details

Forgot password? Click here to reset