Trustworthy Convolutional Neural Networks: A Gradient Penalized-based Approach

by   Nicholas Halliwell, et al.

Convolutional neural networks (CNNs) are commonly used for image classification. Saliency methods are examples of approaches that can be used to interpret CNNs post hoc, identifying the most relevant pixels for a prediction following the gradients flow. Even though CNNs can correctly classify images, the underlying saliency maps could be erroneous in many cases. This can result in skepticism as to the validity of the model or its interpretation. We propose a novel approach for training trustworthy CNNs by penalizing parameter choices that result in inaccurate saliency maps generated during training. We add a penalty term for inaccurate saliency maps produced when the predicted label is correct, a penalty term for accurate saliency maps produced when the predicted label is incorrect, and a regularization term penalizing overly confident saliency maps. Experiments show increased classification performance, user engagement, and trust.



There are no comments yet.


page 2

page 4

page 9


Evaluating Input Perturbation Methods for Interpreting CNNs and Saliency Map Comparison

Input perturbation methods occlude parts of an input to a function and m...

Evaluating Saliency Map Explanations for Convolutional Neural Networks: A User Study

Convolutional neural networks (CNNs) offer great machine learning perfor...

Backdoor Attacks on the DNN Interpretation System

Interpretability is crucial to understand the inner workings of deep neu...

Full-Jacobian Representation of Neural Networks

Non-linear functions such as neural networks can be locally approximated...

Robust saliency maps with decoy-enhanced saliency score

Saliency methods help to make deep neural network predictions more inter...

TSG: Target-Selective Gradient Backprop for Probing CNN Visual Saliency

The explanation for deep neural networks has drawn extensive attention i...

Rethinking Positive Aggregation and Propagation of Gradients in Gradient-based Saliency Methods

Saliency methods interpret the prediction of a neural network by showing...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.