Towards Robust Explanations for Deep Neural Networks

12/18/2020
by   Ann-Kathrin Dombrowski, et al.
27

Explanation methods shed light on the decision process of black-box classifiers such as deep neural networks. But their usefulness can be compromised because they are susceptible to manipulations. With this work, we aim to enhance the resilience of explanations. We develop a unified theoretical framework for deriving bounds on the maximal manipulability of a model. Based on these theoretical insights, we present three different techniques to boost robustness against manipulation: training with weight decay, smoothing activation functions, and minimizing the Hessian of the network. Our experimental results confirm the effectiveness of these approaches.

READ FULL TEXT

page 3

page 10

page 28

page 29

page 30

06/05/2019

Don't Paint It Black: White-Box Explanations for Deep Learning in Computer Security

Deep learning is increasingly used as a basic building block of security...
07/18/2018

Defend Deep Neural Networks Against Adversarial Examples via Fixed andDynamic Quantized Activation Functions

Recent studies have shown that deep neural networks (DNNs) are vulnerabl...
06/19/2019

Explanations can be manipulated and geometry is to blame

Explanation methods aim to make neural networks more trustworthy and int...
10/29/2018

Do Explanations make VQA Models more Predictable to a Human?

A rich line of research attempts to make deep neural networks more trans...
07/20/2020

Fairwashing Explanations with Off-Manifold Detergent

Explanation methods promise to make black-box classifiers more transpare...
12/27/2021

FitAct: Error Resilient Deep Neural Networks via Fine-Grained Post-Trainable Activation Functions

Deep neural networks (DNNs) are increasingly being deployed in safety-cr...
12/18/2019

Iterative and Adaptive Sampling with Spatial Attention for Black-Box Model Explanations

Deep neural networks have achieved great success in many real-world appl...