Interpretations Cannot Be Trusted: Stealthy and Effective Adversarial Perturbations against Interpretable Deep Learning

11/29/2022
by   Eldor Abdukhamidov, et al.
0

Deep learning methods have gained increased attention in various applications due to their outstanding performance. For exploring how this high performance relates to the proper use of data artifacts and the accurate problem formulation of a given task, interpretation models have become a crucial component in developing deep learning-based systems. Interpretation models enable the understanding of the inner workings of deep learning models and offer a sense of security in detecting the misuse of artifacts in the input data. Similar to prediction models, interpretation models are also susceptible to adversarial inputs. This work introduces two attacks, AdvEdge and AdvEdge^+, that deceive both the target deep learning model and the coupled interpretation model. We assess the effectiveness of proposed attacks against two deep learning model architectures coupled with four interpretation models that represent different categories of interpretation models. Our experiments include the attack implementation using various attack frameworks. We also explore the potential countermeasures against such attacks. Our analysis shows the effectiveness of our attacks in terms of deceiving the deep learning models and their interpreters, and highlights insights to improve and circumvent the attacks.

READ FULL TEXT

page 1

page 6

page 7

page 8

page 9

page 10

page 11

page 12

research
12/03/2018

Interpretable Deep Learning under Fire

Providing explanations for complicated deep neural network (DNN) models ...
research
07/12/2023

Single-Class Target-Specific Attack against Interpretable Deep Learning Systems

In this paper, we present a novel Single-class target-specific Adversari...
research
05/28/2019

Certifiably Robust Interpretation in Deep Learning

Although gradient-based saliency maps are popular methods for deep learn...
research
12/21/2017

Wolf in Sheep's Clothing - The Downscaling Attack Against Deep Learning Applications

This paper considers security risks buried in the data processing pipeli...
research
11/29/2021

Being Patient and Persistent: Optimizing An Early Stopping Strategy for Deep Learning in Profiled Attacks

The absence of an algorithm that effectively monitors deep learning mode...
research
11/01/2021

Robustness of deep learning algorithms in astronomy – galaxy morphology studies

Deep learning models are being increasingly adopted in wide array of sci...
research
08/23/2019

A comparative study for interpreting deep learning prediction of the Parkinson's disease diagnosis from SPECT imaging

The application of deep learning to single-photon emission computed tomo...

Please sign up or login with your details

Forgot password? Click here to reset