Explaining by Removing: A Unified Framework for Model Explanation

11/21/2020
by   Ian Covert, et al.
0

Researchers have proposed a wide variety of model explanation approaches, but it remains unclear how most methods are related or when one method is preferable to another. We establish a new class of methods, removal-based explanations, that are based on the principle of simulating feature removal to quantify each feature's influence. These methods vary in several respects, so we develop a framework that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence. Our framework unifies 25 existing methods, including several of the most widely used approaches (SHAP, LIME, Meaningful Perturbations, permutation tests). This new class of explanation methods has rich connections that we examine using tools that have been largely overlooked by the explainability literature. To anchor removal-based explanations in cognitive psychology, we show that feature removal is a simple application of subtractive counterfactual reasoning. Ideas from cooperative game theory shed light on the relationships and trade-offs among different methods, and we derive conditions under which all removal-based explanations have information-theoretic interpretations. Through this analysis, we develop a unified framework that helps practitioners better understand model explanation tools, and that offers a strong theoretical foundation upon which future explainability research can build.

READ FULL TEXT
research
11/06/2020

Feature Removal Is a Unifying Principle for Model Explanation Methods

Researchers have proposed a wide variety of model explanation approaches...
research
04/14/2022

Global Counterfactual Explanations: Investigations, Implementations and Improvements

Counterfactual explanations have been widely studied in explainability, ...
research
01/06/2022

Topological Representations of Local Explanations

Local explainability methods – those which seek to generate an explanati...
research
05/22/2017

A Unified Approach to Interpreting Model Predictions

Understanding why a model makes a certain prediction can be as crucial a...
research
05/26/2023

GLOBE-CE: A Translation-Based Approach for Global Counterfactual Explanations

Counterfactual explanations have been widely studied in explainability, ...
research
02/15/2022

Contextual Importance and Utility: aTheoretical Foundation

This paper provides new theory to support to the eXplainable AI (XAI) me...
research
07/15/2022

Algorithms to estimate Shapley value feature attributions

Feature attributions based on the Shapley value are popular for explaini...

Please sign up or login with your details

Forgot password? Click here to reset