A Practical Upper Bound for the Worst-Case Attribution Deviations

03/01/2023
by   Fan Wang, et al.
0

Model attribution is a critical component of deep neural networks (DNNs) for its interpretability to complex models. Recent studies bring up attention to the security of attribution methods as they are vulnerable to attribution attacks that generate similar images with dramatically different attributions. Existing works have been investigating empirically improving the robustness of DNNs against those attacks; however, none of them explicitly quantifies the actual deviations of attributions. In this work, for the first time, a constrained optimization problem is formulated to derive an upper bound that measures the largest dissimilarity of attributions after the samples are perturbed by any noises within a certain region while the classification results remain the same. Based on the formulation, different practical approaches are introduced to bound the attributions above using Euclidean distance and cosine similarity under both ℓ_2 and ℓ_∞-norm perturbations constraints. The bounds developed by our theoretical study are validated on various datasets and two different types of attacks (PGD attack and IFIA attribution attack). Over 10 million attacks in the experiments indicate that the proposed upper bounds effectively quantify the robustness of models based on the worst-case attribution dissimilarities.

READ FULL TEXT

page 3

page 5

page 15

page 16

page 17

page 18

research
05/15/2022

Exploiting the Relationship Between Kendall's Rank Correlation and Cosine Similarity for Attribution Protection

Model attributions are important in deep neural networks as they aid pra...
research
06/12/2023

On the Robustness of Removal-Based Feature Attributions

To explain complex models based on their inputs, many feature attributio...
research
06/11/2020

Smoothed Geometry for Robust Attribution

Feature attributions are a popular tool for explaining the behavior of D...
research
03/31/2022

Improving Adversarial Transferability via Neuron Attribution-Based Attacks

Deep neural networks (DNNs) are known to be vulnerable to adversarial ex...
research
11/29/2022

Towards More Robust Interpretation via Local Gradient Alignment

Neural network interpretation methods, particularly feature attribution ...
research
09/20/2023

Contrastive Pseudo Learning for Open-World DeepFake Attribution

The challenge in sourcing attribution for forgery faces has gained wides...
research
05/24/2019

The advantages of multiple classes for reducing overfitting from test set reuse

Excessive reuse of holdout data can lead to overfitting. However, there ...

Please sign up or login with your details

Forgot password? Click here to reset