Certifiably Robust Interpretation via Renyi Differential Privacy

07/04/2021
by   Ao Liu, et al.
2

Motivated by the recent discovery that the interpretation maps of CNNs could easily be manipulated by adversarial attacks against network interpretability, we study the problem of interpretation robustness from a new perspective of differential privacy (RDP). The advantages of our Renyi-Robust-Smooth (RDP-based interpretation method) are three-folds. First, it can offer provable and certifiable top-k robustness. That is, the top-k important attributions of the interpretation map are provably robust under any input perturbation with bounded ℓ_d-norm (for any d≥ 1, including d = ∞). Second, our proposed method offers ∼10% better experimental robustness than existing approaches in terms of the top-k attributions. Remarkably, the accuracy of Renyi-Robust-Smooth also outperforms existing approaches. Third, our method can provide a smooth tradeoff between robustness and computational efficiency. Experimentally, its top-k attributions are twice more robust than existing approaches when the computational resources are highly constrained.

READ FULL TEXT

page 2

page 3

page 7

research
05/27/2019

Provable robustness against all adversarial l_p-perturbations for p≥ 1

In recent years several adversarial attacks and defenses have been propo...
research
06/26/2020

Proper Network Interpretability Helps Adversarial Robustness in Classification

Recent works have empirically shown that there exist adversarial example...
research
12/14/2020

Robustness Threats of Differential Privacy

Differential privacy is a powerful and gold-standard concept of measurin...
research
09/07/2022

Bayesian and Frequentist Semantics for Common Variations of Differential Privacy: Applications to the 2020 Census

The purpose of this paper is to guide interpretation of the semantic pri...
research
01/03/2022

On robustness and local differential privacy

It is of soaring demand to develop statistical analysis tools that are r...
research
11/29/2022

Towards More Robust Interpretation via Local Gradient Alignment

Neural network interpretation methods, particularly feature attribution ...

Please sign up or login with your details

Forgot password? Click here to reset