Brittle interpretations: The Vulnerability of TCAV and Other Concept-based Explainability Tools to Adversarial Attack

10/14/2021
by   Davis Brown, et al.
0

Methods for model explainability have become increasingly critical for testing the fairness and soundness of deep learning. A number of explainability techniques have been developed which use a set of examples to represent a human-interpretable concept in a model's activations. In this work we show that these explainability methods can suffer the same vulnerability to adversarial attacks as the models they are meant to analyze. We demonstrate this phenomenon on two well-known concept-based approaches to the explainability of deep learning models: TCAV and faceted feature visualization. We show that by carefully perturbing the examples of the concept that is being investigated, we can radically change the output of the interpretability method, e.g. showing that stripes are not an important factor in identifying images of a zebra. Our work highlights the fact that in safety-critical applications, there is need for security around not only the machine learning pipeline but also the model interpretation process.

READ FULL TEXT

page 7

page 8

page 13

research
07/17/2023

Analyzing the Impact of Adversarial Examples on Explainable Machine Learning

Adversarial attacks are a type of attack on machine learning models wher...
research
05/06/2022

The Road to Explainability is Paved with Bias: Measuring the Fairness of Explanations

Machine learning models in safety-critical settings like healthcare are ...
research
09/07/2023

Automatic Concept Embedding Model (ACEM): No train-time concepts, No issue!

Interpretability and explainability of neural networks is continuously i...
research
03/15/2022

An explainability framework for cortical surface-based deep learning

The emergence of explainability methods has enabled a better comprehensi...
research
07/20/2020

DeepNNK: Explaining deep models and their generalization using polytope interpolation

Modern machine learning systems based on neural networks have shown grea...
research
08/25/2023

GEMTrans: A General, Echocardiography-based, Multi-Level Transformer Framework for Cardiovascular Diagnosis

Echocardiography (echo) is an ultrasound imaging modality that is widely...
research
05/05/2021

Attack-agnostic Adversarial Detection on Medical Data Using Explainable Machine Learning

Explainable machine learning has become increasingly prevalent, especial...

Please sign up or login with your details

Forgot password? Click here to reset