"How do I fool you?": Manipulating User Trust via Misleading Black Box Explanations

11/15/2019
by   Himabindu Lakkaraju, et al.
0

As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a human interpretable manner. It has recently become apparent that a high-fidelity explanation of a black box ML model may not accurately reflect the biases in the black box. As a consequence, explanations have the potential to mislead human users into trusting a problematic black box. In this work, we rigorously explore the notion of misleading explanations and how they influence user trust in black-box models. More specifically, we propose a novel theoretical framework for understanding and generating misleading explanations, and carry out a user study with domain experts to demonstrate how these explanations can be used to mislead users. Our work is the first to empirically establish how user trust in black box models can be manipulated via misleading explanations.

READ FULL TEXT

page 5

page 6

research
11/06/2019

How can we fool LIME and SHAP? Adversarial Attacks on Post hoc Explanation Methods

As machine learning black boxes are increasingly being deployed in domai...
research
06/24/2021

What will it take to generate fairness-preserving explanations?

In situations where explanations of black-box models may be useful, the ...
research
07/04/2017

Interpretable & Explorable Approximations of Black Box Models

We propose Black Box Explanations through Transparent Approximations (BE...
research
04/17/2019

"Why did you do that?": Explaining black box models with Inductive Synthesis

By their nature, the composition of black box models is opaque. This mak...
research
12/21/2018

Example and Feature importance-based Explanations for Black-box Machine Learning Models

As machine learning models become more accurate, they typically become m...
research
11/12/2020

Robust and Stable Black Box Explanations

As machine learning black boxes are increasingly being deployed in real-...
research
10/14/2021

Can Explanations Be Useful for Calibrating Black Box Models?

One often wants to take an existing, trained NLP model and use it on dat...

Please sign up or login with your details

Forgot password? Click here to reset