Adversarial Infidelity Learning for Model Interpretation

06/09/2020
by   Jian Liang, et al.
0

Model interpretation is essential in data mining and knowledge discovery. It can help understand the intrinsic model working mechanism and check if the model has undesired characteristics. A popular way of performing model interpretation is Instance-wise Feature Selection (IFS), which provides an importance score of each feature representing the data samples to explain how the model generates the specific output. In this paper, we propose a Model-agnostic Effective Efficient Direct (MEED) IFS framework for model interpretation, mitigating concerns about sanity, combinatorial shortcuts, model identifiability, and information transmission. Also, we focus on the following setting: using selected features to directly predict the output of the given model, which serves as a primary evaluation metric for model-interpretation methods. Apart from the features, we involve the output of the given model as an additional input to learn an explainer based on more accurate information. To learn the explainer, besides fidelity, we propose an Adversarial Infidelity Learning (AIL) mechanism to boost the explanation learning by screening relatively unimportant features. Through theoretical and experimental analysis, we show that our AIL mechanism can help learn the desired conditional distribution between selected features and targets. Moreover, we extend our framework by integrating efficient interpretation methods as proper priors to provide a warm start. Comprehensive empirical evaluation results are provided by quantitative metrics and human evaluation to demonstrate the effectiveness and superiority of our proposed method. Our code is publicly available online at https://github.com/langlrsw/MEED.

READ FULL TEXT

page 2

page 7

research
02/21/2018

Learning to Explain: An Information-Theoretic Perspective on Model Interpretation

We introduce instancewise feature selection as a methodology for model i...
research
04/26/2021

Instance-wise Causal Feature Selection for Model Interpretation

We formulate a causal extension to the recently introduced paradigm of i...
research
06/10/2023

Two-Stage Holistic and Contrastive Explanation of Image Classification

The need to explain the output of a deep neural network classifier is no...
research
04/03/2020

Unpack Local Model Interpretation for GBDT

A gradient boosting decision tree (GBDT), which aggregates a collection ...
research
09/27/2022

Inducing Data Amplification Using Auxiliary Datasets in Adversarial Training

Several recent studies have shown that the use of extra in-distribution ...
research
06/26/2018

AutoSpearman: Automatically Mitigating Correlated Metrics for Interpreting Defect Models

The interpretation of defect models heavily relies on software metrics t...
research
06/18/2023

In-Process Global Interpretation for Graph Learning via Distribution Matching

Graphs neural networks (GNNs) have emerged as a powerful graph learning ...

Please sign up or login with your details

Forgot password? Click here to reset