Log In Sign Up

Improving Attribution Methods by Learning Submodular Functions

by   Piyushi Manupriya, et al.

This work explores the novel idea of learning a submodular scoring function to improve the specificity/selectivity of existing feature attribution methods. Submodular scores are natural for attribution as they are known to accurately model the principle of diminishing returns. A new formulation for learning a deep submodular set function that is consistent with the real-valued attribution maps obtained by existing attribution methods is proposed. This formulation not only ensures that the scores for the heat maps that include the highly attributed features across the existing methods are high, but also that the score saturates even for the most specific heat map. The final attribution value of a feature is then defined as the marginal gain in the induced submodular score of the feature in the context of other highly attributed features, thus decreasing the attribution of redundant yet discriminatory features. Experiments on multiple datasets illustrate that the proposed attribution method achieves higher specificity while not degrading the discriminative power.


page 2

page 13

page 14

page 19

page 21

page 23

page 25

page 26


Discriminative Attribution from Counterfactuals

We present a method for neural network interpretability by combining fea...

A General Taylor Framework for Unifying and Revisiting Attribution Methods

Attribution methods provide an insight into the decision-making process ...

Evaluating Feature Attribution Methods in the Image Domain

Feature attribution maps are a popular approach to highlight the most im...

Continuous Attribution of Episodical Outcomes for More Efficient and Targeted Online Measurement

Online experimentation platforms collect user feedback at low cost and l...

A Unified Taylor Framework for Revisiting Attribution Methods

Attribution methods have been developed to understand the decision makin...

Generating Attribution Maps with Disentangled Masked Backpropagation

Attribution map visualization has arisen as one of the most effective te...

Integrated Directional Gradients: Feature Interaction Attribution for Neural NLP Models

In this paper, we introduce Integrated Directional Gradients (IDG), a me...