A Unified Taylor Framework for Revisiting Attribution Methods

08/21/2020
by   Huiqi Deng, et al.
0

Attribution methods have been developed to understand the decision making process of machine learning models, especially deep neural networks, by assigning importance scores to individual features. Existing attribution methods often built upon empirical intuitions and heuristics. There still lacks a unified framework that can provide deeper understandings of their rationales, theoretical fidelity, and limitations. To bridge the gap, we present a Taylor attribution framework to theoretically characterize the fidelity of explanations. The key idea is to decompose model behaviors into first-order, high-order independent, and high-order interactive terms, which makes clearer attribution of high-order effects and complex feature interactions. Three desired properties are proposed for Taylor attributions, i.e., low model approximation error, accurate assignment of independent and interactive effects. Moreover, several popular attribution methods are mathematically reformulated under the unified Taylor attribution framework. Our theoretical investigations indicate that these attribution methods implicitly reflect high-order terms involving complex feature interdependencies. Among these methods, Integrated Gradient is the only one satisfying the proposed three desired properties. New attribution methods are proposed based on Integrated Gradient by utilizing the Taylor framework. Experimental results show that the proposed method outperforms the existing ones in model interpretations.

READ FULL TEXT
research
05/28/2021

A General Taylor Framework for Unifying and Revisiting Attribution Methods

Attribution methods provide an insight into the decision-making process ...
research
05/16/2023

The Weighted Möbius Score: A Unified Framework for Feature Attribution

Feature attribution aims to explain the reasoning behind a black-box mod...
research
07/12/2023

Stability Guarantees for Feature Attributions with Multiplicative Smoothing

Explanation methods for machine learning models tend to not provide any ...
research
04/19/2021

Improving Attribution Methods by Learning Submodular Functions

This work explores the novel idea of learning a submodular scoring funct...
research
07/19/2021

Path Integrals for the Attribution of Model Uncertainties

Enabling interpretations of model uncertainties is of key importance in ...
research
06/12/2023

On the Robustness of Removal-Based Feature Attributions

To explain complex models based on their inputs, many feature attributio...
research
05/19/2022

Towards a Theory of Faithfulness: Faithful Explanations of Differentiable Classifiers over Continuous Data

There is broad agreement in the literature that explanation methods shou...

Please sign up or login with your details

Forgot password? Click here to reset