Generative Counterfactuals for Neural Networks via Attribute-Informed Perturbation

01/18/2021
by   Fan Yang, et al.
0

With the wide use of deep neural networks (DNN), model interpretability has become a critical concern, since explainable decisions are preferred in high-stake scenarios. Current interpretation techniques mainly focus on the feature attribution perspective, which are limited in indicating why and how particular explanations are related to the prediction. To this end, an intriguing class of explanations, named counterfactuals, has been developed to further explore the "what-if" circumstances for interpretation, and enables the reasoning capability on black-box models. However, generating counterfactuals for raw data instances (i.e., text and image) is still in the early stage due to its challenges on high data dimensionality and unsemantic raw features. In this paper, we design a framework to generate counterfactuals specifically for raw data instances with the proposed Attribute-Informed Perturbation (AIP). By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively and efficiently. Instead of directly modifying instances in the data space, we iteratively optimize the constructed attribute-informed latent space, where features are more robust and semantic. Experimental results on real-world texts and images demonstrate the effectiveness, sample quality as well as efficiency of our designed framework, and show the superiority over other alternatives. Besides, we also introduce some practical applications based on our framework, indicating its potential beyond the model interpretability aspect.

READ FULL TEXT
research
12/08/2020

An Empirical Study of Explainable AI Techniques on Deep Learning Models For Time Series Tasks

Decision explanations of machine learning black-box models are often gen...
research
03/02/2021

Contrastive Explanations for Model Interpretability

Contrastive explanations clarify why an event occurred in contrast to an...
research
10/19/2022

Black Box Model Explanations and the Human Interpretability Expectations – An Analysis in the Context of Homicide Prediction

Strategies based on Explainable Artificial Intelligence - XAI have promo...
research
06/24/2021

Software for Dataset-wide XAI: From Local Explanations to Global Insights with Zennit, CoRelAy, and ViRelAy

Deep Neural Networks (DNNs) are known to be strong predictors, but their...
research
03/19/2018

Towards Explanation of DNN-based Prediction with Guided Feature Inversion

While deep neural networks (DNN) have become an effective computational ...
research
01/06/2020

Generating Semantic Adversarial Examples via Feature Manipulation

The vulnerability of deep neural networks to adversarial attacks has bee...
research
05/30/2023

Perturbation-Assisted Sample Synthesis: A Novel Approach for Uncertainty Quantification

This paper introduces a novel generator called Perturbation-Assisted Sam...

Please sign up or login with your details

Forgot password? Click here to reset