DeepAI AI Chat
Log In Sign Up

Evaluations and Methods for Explanation through Robustness Analysis

05/31/2020
by   Cheng-Yu Hsieh, et al.
Google
Carnegie Mellon University
15

Among multiple ways of interpreting a machine learning model, measuring the importance of a set of features tied to a prediction is probably one of the most intuitive ways to explain a model. In this paper, we establish the link between a set of features to a prediction with a new evaluation criterion, robustness analysis, which measures the minimum distortion distance of adversarial perturbation. By measuring the tolerance level for an adversarial attack, we can extract a set of features that provides the most robust support for a prediction, and also can extract a set of features that contrasts the current prediction to a target class by setting a targeted adversarial attack. By applying this methodology to various prediction tasks across multiple domains, we observe the derived explanations are indeed capturing the significant feature set qualitatively and quantitatively.

READ FULL TEXT

page 13

page 14

07/13/2017

Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models

Even todays most advanced machine learning models are easily fooled by a...
07/16/2022

CARBEN: Composite Adversarial Robustness Benchmark

Prior literature on adversarial attack methods has mainly focused on att...
04/22/2022

How Sampling Impacts the Robustness of Stochastic Neural Networks

Stochastic neural networks (SNNs) are random functions and predictions a...
08/21/2019

Testing Robustness Against Unforeseen Adversaries

Considerable work on adversarial defense has studied robustness to a fix...
05/10/2019

Interpreting and Evaluating Neural Network Robustness

Recently, adversarial deception becomes one of the most considerable thr...
04/23/2021

Evaluating Deception Detection Model Robustness To Linguistic Variation

With the increasing use of machine-learning driven algorithmic judgement...
12/20/2019

Explainability and Adversarial Robustness for RNNs

Recurrent Neural Networks (RNNs) yield attractive properties for constru...