Robust Ranking Explanations

12/28/2022
by   Chao Chen, et al.
0

Gradient-based explanation is the cornerstone of explainable deep networks, but it has been shown to be vulnerable to adversarial attacks. However, existing works measure the explanation robustness based on ℓ_p-norm, which can be counter-intuitive to humans, who only pay attention to the top few salient features. We propose explanation ranking thickness as a more suitable explanation robustness metric. We then present a new practical adversarial attacking goal for manipulating explanation rankings. To mitigate the ranking-based attacks while maintaining computational feasibility, we derive surrogate bounds of the thickness that involve expensive sampling and integration. We use a multi-objective approach to analyze the convergence of a gradient-based attack to confirm that the explanation robustness can be measured by the thickness metric. We conduct experiments on various network architectures and diverse datasets to prove the superiority of the proposed methods, while the widely accepted Hessian-based curvature smoothing approaches are not as robust as our method.

READ FULL TEXT

page 19

page 20

page 25

research
06/14/2022

When adversarial attacks become interpretable counterfactual explanations

We argue that, when learning a 1-Lipschitz neural network with the dual ...
research
01/16/2021

Multi-objective Search of Robust Neural Architectures against Multiple Types of Adversarial Attacks

Many existing deep learning models are vulnerable to adversarial example...
research
06/24/2022

Robustness of Explanation Methods for NLP Models

Explanation methods have emerged as an important tool to highlight the f...
research
12/16/2022

Robust Explanation Constraints for Neural Networks

Post-hoc explanation methods are used with the intent of providing insig...
research
09/05/2022

"Is your explanation stable?": A Robustness Evaluation Framework for Feature Attribution

Understanding the decision process of neural networks is hard. One vital...
research
07/13/2020

A simple defense against adversarial attacks on heatmap explanations

With machine learning models being used for more sensitive applications,...
research
03/30/2022

Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis

Respiratory sound classification is an important tool for remote screeni...

Please sign up or login with your details

Forgot password? Click here to reset