Smoothed Geometry for Robust Attribution

06/11/2020
by   Zifan Wang, et al.
6

Feature attributions are a popular tool for explaining the behavior of Deep Neural Networks (DNNs), but have recently been shown to be vulnerable to attacks that produce divergent explanations for nearby inputs. This lack of robustness is especially problematic in high-stakes applications where adversarially-manipulated explanations could impair safety and trustworthiness. Building on a geometric understanding of these attacks presented in recent work, we identify Lipschitz continuity conditions on models' gradient that lead to robust gradient-based attributions, and observe that smoothness may also be related to the ability of an attack to transfer across multiple attribution methods. To mitigate these attacks in practice, we propose an inexpensive regularization method that promotes these conditions in DNNs, as well as a stochastic smoothing technique that does not require re-training. Our experiments on a range of image models demonstrate that both of these mitigations consistently improve attribution robustness, and confirm the role that smooth geometry plays in these attacks on real, large-scale models.

READ FULL TEXT

page 6

page 8

page 21

research
12/28/2020

Enhanced Regularizers for Attributional Robustness

Deep neural networks are the default choice of learning models for compu...
research
03/20/2021

Boundary Attributions Provide Normal (Vector) Explanations

Recent work on explaining Deep Neural Networks (DNNs) focuses on attribu...
research
05/24/2023

Scale Matters: Attribution Meets the Wavelet Domain to Explain Model Sensitivity to Image Corruptions

Neural networks have shown remarkable performance in computer vision, bu...
research
03/01/2023

A Practical Upper Bound for the Worst-Case Attribution Deviations

Model attribution is a critical component of deep neural networks (DNNs)...
research
07/05/2023

DARE: Towards Robust Text Explanations in Biomedical and Healthcare Applications

Along with the successful deployment of deep neural networks in several ...
research
05/03/2023

Defending against Insertion-based Textual Backdoor Attacks via Attribution

Textual backdoor attack, as a novel attack model, has been shown to be e...
research
05/15/2022

Exploiting the Relationship Between Kendall's Rank Correlation and Cosine Similarity for Attribution Protection

Model attributions are important in deep neural networks as they aid pra...

Please sign up or login with your details

Forgot password? Click here to reset