Robust Explanation Constraints for Neural Networks

12/16/2022
by   Matthew Wicker, et al.
0

Post-hoc explanation methods are used with the intent of providing insights about neural networks and are sometimes said to help engender trust in their outputs. However, popular explanations methods have been found to be fragile to minor perturbations of input features or model parameters. Relying on constraint relaxation techniques from non-convex optimization, we develop a method that upper-bounds the largest change an adversary can make to a gradient-based explanation via bounded manipulation of either the input features or model parameters. By propagating a compact input or parameter set as symbolic intervals through the forwards and backwards computations of the neural network we can formally certify the robustness of gradient-based explanations. Our bounds are differentiable, hence we can incorporate provable explanation robustness into neural network training. Empirically, our method surpasses the robustness provided by previous heuristic approaches. We find that our training method is the only method able to learn neural networks with certificates of explanation robustness across all six datasets tested.

READ FULL TEXT

page 5

page 9

page 15

page 16

page 18

page 19

page 20

research
06/19/2019

Explanations can be manipulated and geometry is to blame

Explanation methods aim to make neural networks more trustworthy and int...
research
06/24/2022

Robustness of Explanation Methods for NLP Models

Explanation methods have emerged as an important tool to highlight the f...
research
02/21/2021

Towards the Unification and Robustness of Perturbation and Gradient Based Explanations

As machine learning black boxes are increasingly being deployed in criti...
research
06/15/2022

The Manifold Hypothesis for Gradient-Based Explanations

When do gradient-based explanation algorithms provide meaningful explana...
research
12/28/2022

Robust Ranking Explanations

Gradient-based explanation is the cornerstone of explainable deep networ...
research
12/18/2020

Towards Robust Explanations for Deep Neural Networks

Explanation methods shed light on the decision process of black-box clas...
research
05/14/2020

Distilling neural networks into skipgram-level decision lists

Several previous studies on explanation for recurrent neural networks fo...

Please sign up or login with your details

Forgot password? Click here to reset