DeepAI AI Chat
Log In Sign Up

Evaluations and Methods for Explanation through Robustness Analysis

by   Cheng-Yu Hsieh, et al.
Carnegie Mellon University

Among multiple ways of interpreting a machine learning model, measuring the importance of a set of features tied to a prediction is probably one of the most intuitive ways to explain a model. In this paper, we establish the link between a set of features to a prediction with a new evaluation criterion, robustness analysis, which measures the minimum distortion distance of adversarial perturbation. By measuring the tolerance level for an adversarial attack, we can extract a set of features that provides the most robust support for a prediction, and also can extract a set of features that contrasts the current prediction to a target class by setting a targeted adversarial attack. By applying this methodology to various prediction tasks across multiple domains, we observe the derived explanations are indeed capturing the significant feature set qualitatively and quantitatively.


page 13

page 14


Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models

Even todays most advanced machine learning models are easily fooled by a...

CARBEN: Composite Adversarial Robustness Benchmark

Prior literature on adversarial attack methods has mainly focused on att...

How Sampling Impacts the Robustness of Stochastic Neural Networks

Stochastic neural networks (SNNs) are random functions and predictions a...

Testing Robustness Against Unforeseen Adversaries

Considerable work on adversarial defense has studied robustness to a fix...

Interpreting and Evaluating Neural Network Robustness

Recently, adversarial deception becomes one of the most considerable thr...

Evaluating Deception Detection Model Robustness To Linguistic Variation

With the increasing use of machine-learning driven algorithmic judgement...

Explainability and Adversarial Robustness for RNNs

Recurrent Neural Networks (RNNs) yield attractive properties for constru...