Towards Robust Explanations for Deep Neural Networks

12/18/2020
by   Ann-Kathrin Dombrowski, et al.
27

Explanation methods shed light on the decision process of black-box classifiers such as deep neural networks. But their usefulness can be compromised because they are susceptible to manipulations. With this work, we aim to enhance the resilience of explanations. We develop a unified theoretical framework for deriving bounds on the maximal manipulability of a model. Based on these theoretical insights, we present three different techniques to boost robustness against manipulation: training with weight decay, smoothing activation functions, and minimizing the Hessian of the network. Our experimental results confirm the effectiveness of these approaches.

READ FULL TEXT

page 3

page 10

page 28

page 29

page 30

research
06/05/2019

Don't Paint It Black: White-Box Explanations for Deep Learning in Computer Security

Deep learning is increasingly used as a basic building block of security...
research
07/18/2018

Defend Deep Neural Networks Against Adversarial Examples via Fixed andDynamic Quantized Activation Functions

Recent studies have shown that deep neural networks (DNNs) are vulnerabl...
research
06/19/2019

Explanations can be manipulated and geometry is to blame

Explanation methods aim to make neural networks more trustworthy and int...
research
01/23/2023

SpArX: Sparse Argumentative Explanations for Neural Networks

Neural networks (NNs) have various applications in AI, but explaining th...
research
12/16/2022

Robust Explanation Constraints for Neural Networks

Post-hoc explanation methods are used with the intent of providing insig...
research
07/05/2023

DARE: Towards Robust Text Explanations in Biomedical and Healthcare Applications

Along with the successful deployment of deep neural networks in several ...

Please sign up or login with your details

Forgot password? Click here to reset