When and How to Fool Explainable Models (and Humans) with Adversarial Examples

07/05/2021
by   Jon Vadillo, et al.
0

Reliable deployment of machine learning models such as neural networks continues to be challenging due to several limitations. Some of the main shortcomings are the lack of interpretability and the lack of robustness against adversarial examples or out-of-distribution inputs. In this paper, we explore the possibilities and limits of adversarial attacks for explainable machine learning models. First, we extend the notion of adversarial examples to fit in explainable machine learning scenarios, in which the inputs, the output classifications and the explanations of the model's decisions are assessed by humans. Next, we propose a comprehensive framework to study whether (and how) adversarial examples can be generated for explainable models under human assessment, introducing novel attack paradigms. In particular, our framework considers a wide range of relevant (yet often ignored) factors such as the type of problem, the user expertise or the objective of the explanations in order to identify the attack strategies that should be adopted in each scenario to successfully deceive the model (and the human). These contributions intend to serve as a basis for a more rigorous and realistic study of adversarial examples in the field of explainable machine learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/02/2018

Adversarial Examples - A Complete Characterisation of the Phenomenon

We provide a complete characterisation of the phenomenon of adversarial ...
research
04/14/2020

Extending Adversarial Attacks to Produce Adversarial Class Probability Distributions

Despite the remarkable performance and generalization levels of deep lea...
research
09/26/2019

Adversarial ML Attack on Self Organizing Cellular Networks

Deep Neural Networks (DNN) have been widely adopted in self-organizing n...
research
06/12/2023

When Vision Fails: Text Attacks Against ViT and OCR

While text-based machine learning models that operate on visual inputs o...
research
11/30/2017

ConvNets and ImageNet Beyond Accuracy: Explanations, Bias Detection, Adversarial Examples and Model Criticism

ConvNets and Imagenet have driven the recent success of deep learning fo...
research
02/19/2023

Adversarial Machine Learning: A Systematic Survey of Backdoor Attack, Weight Attack and Adversarial Example

Adversarial machine learning (AML) studies the adversarial phenomenon of...
research
09/07/2018

Detecting Potential Local Adversarial Examples for Human-Interpretable Defense

Machine learning models are increasingly used in the industry to make de...

Please sign up or login with your details

Forgot password? Click here to reset