Exploiting the Relationship Between Kendall's Rank Correlation and Cosine Similarity for Attribution Protection

05/15/2022
by   Fan Wang, et al.
6

Model attributions are important in deep neural networks as they aid practitioners in understanding the models, but recent studies reveal that attributions can be easily perturbed by adding imperceptible noise to the input. The non-differentiable Kendall's rank correlation is a key performance index for attribution protection. In this paper, we first show that the expected Kendall's rank correlation is positively correlated to cosine similarity and then indicate that the direction of attribution is the key to attribution robustness. Based on these findings, we explore the vector space of attribution to explain the shortcomings of attribution defense methods using ℓ_p norm and propose integrated gradient regularizer (IGR), which maximizes the cosine similarity between natural and perturbed attributions. Our analysis further exposes that IGR encourages neurons with the same activation states for natural samples and the corresponding perturbed samples, which is shown to induce robustness to gradient-based attribution methods. Our experiments on different models and datasets confirm our analysis on attribution protection and demonstrate a decent improvement in adversarial robustness.

READ FULL TEXT

page 2

page 9

page 19

page 20

research
03/01/2023

A Practical Upper Bound for the Worst-Case Attribution Deviations

Model attribution is a critical component of deep neural networks (DNNs)...
research
06/12/2023

On the Robustness of Removal-Based Feature Attributions

To explain complex models based on their inputs, many feature attributio...
research
03/14/2019

Attribution-driven Causal Analysis for Detection of Adversarial Examples

Attribution methods have been developed to explain the decision of a mac...
research
11/29/2022

Towards More Robust Interpretation via Local Gradient Alignment

Neural network interpretation methods, particularly feature attribution ...
research
06/11/2020

Smoothed Geometry for Robust Attribution

Feature attributions are a popular tool for explaining the behavior of D...
research
12/20/2019

When Explanations Lie: Why Modified BP Attribution Fails

Modified backpropagation methods are a popular group of attribution meth...
research
07/15/2022

Anomalous behaviour in loss-gradient based interpretability methods

Loss-gradients are used to interpret the decision making process of deep...

Please sign up or login with your details

Forgot password? Click here to reset