How to Manipulate CNNs to Make Them Lie: the GradCAM Case

07/25/2019
by   Tom Viering, et al.
0

Recently many methods have been introduced to explain CNN decisions. However, it has been shown that some methods can be sensitive to manipulation of the input. We continue this line of work and investigate the explanation method GradCAM. Instead of manipulating the input, we consider an adversary that manipulates the model itself to attack the explanation. By changing weights and architecture, we demonstrate that it is possible to generate any desired explanation, while leaving the model's accuracy essentially unchanged. This illustrates that GradCAM cannot explain the decision of every CNN and provides a proof of concept showing that it is possible to obfuscate the inner workings of a CNN. Finally, we combine input and model manipulation. To this end we put a backdoor in the network: the explanation is correct unless there is a specific pattern present in the input, which triggers a malicious explanation. Our work raises new security concerns, especially in settings where explanations of models may be used to make decisions, such as in the medical domain.

READ FULL TEXT

page 3

page 6

research
04/03/2023

Explanation: from ethics to logic

When a decision, such as the approval or denial of a bank loan, is deleg...
research
03/04/2022

Do Explanations Explain? Model Knows Best

It is a mystery which input features contribute to a neural network's ou...
research
07/13/2020

A simple defense against adversarial attacks on heatmap explanations

With machine learning models being used for more sensitive applications,...
research
09/15/2017

Embedding Deep Networks into Visual Explanations

In this paper, we propose a novel explanation module to explain the pred...
research
11/08/2021

Defense Against Explanation Manipulation

Explainable machine learning attracts increasing attention as it improve...
research
03/27/2013

Theory-Based Inductive Learning: An Integration of Symbolic and Quantitative Methods

The objective of this paper is to propose a method that will generate a ...
research
07/10/2020

Scientific Discovery by Generating Counterfactuals using Image Translation

Model explanation techniques play a critical role in understanding the s...

Please sign up or login with your details

Forgot password? Click here to reset