Four Axiomatic Characterizations of the Integrated Gradients Attribution Method

by   Daniel Lundstrom, et al.
University of Southern California

Deep neural networks have produced significant progress among machine learning models in terms of accuracy and functionality, but their inner workings are still largely unknown. Attribution methods seek to shine a light on these "black box" models by indicating how much each input contributed to a model's outputs. The Integrated Gradients (IG) method is a state of the art baseline attribution method in the axiomatic vein, meaning it is designed to conform to particular principles of attributions. We present four axiomatic characterizations of IG, establishing IG as the unique method to satisfy different sets of axioms among a class of attribution methods.


page 1

page 2

page 3

page 4


Baseline Computation for Attribution Methods Based on Interpolated Inputs

We discuss a way to find a well behaved baseline for attribution methods...

Case Study: Explaining Diabetic Retinopathy Detection Deep CNNs via Integrated Gradients

In this report, we applied integrated gradients to explaining a neural n...

IS-CAM: Integrated Score-CAM for axiomatic-based explanations

Convolutional Neural Networks have been known as black-box models as hum...

A Rigorous Study of Integrated Gradients Method and Extensions to Internal Neuron Attributions

As the efficacy of deep learning (DL) grows, so do concerns about the la...

Integrated Directional Gradients: Feature Interaction Attribution for Neural NLP Models

In this paper, we introduce Integrated Directional Gradients (IDG), a me...

Impossibility Theorems for Feature Attribution

Despite a sea of interpretability methods that can produce plausible exp...

Robust Attribution Regularization

An emerging problem in trustworthy machine learning is to train models t...

Please sign up or login with your details

Forgot password? Click here to reset