Causal Proxy Models for Concept-Based Model Explanations

09/28/2022
by   Zhengxuan Wu, et al.
19

Explainability methods for NLP systems encounter a version of the fundamental problem of causal inference: for a given ground-truth input text, we never truly observe the counterfactual texts necessary for isolating the causal effects of model representations on outputs. In response, many explainability methods make no use of counterfactual texts, assuming they will be unavailable. In this paper, we show that robust causal explainability methods can be created using approximate counterfactuals, which can be written by humans to approximate a specific counterfactual or simply sampled using metadata-guided heuristics. The core of our proposal is the Causal Proxy Model (CPM). A CPM explains a black-box model 𝒩 because it is trained to have the same actual input/output behavior as 𝒩 while creating neural representations that can be intervened upon to simulate the counterfactual input/output behavior of 𝒩. Furthermore, we show that the best CPM for 𝒩 performs comparably to 𝒩 in making factual predictions, which means that the CPM can simply replace 𝒩, leading to more explainable deployed models. Our code is available at https://github.com/frankaging/Causal-Proxy-Model.

READ FULL TEXT
research
12/16/2022

Counterfactual Explanations for Misclassified Images: How Human and Machine Explanations Differ

Counterfactual explanations have emerged as a popular solution for the e...
research
02/21/2022

Diffusion Causal Models for Counterfactual Estimation

We consider the task of counterfactual estimation from observational ima...
research
05/27/2022

CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior

The increasing size and complexity of modern ML systems has improved the...
research
10/07/2022

CLEAR: Causal Explanations from Attention in Neural Recommenders

We present CLEAR, a method for learning session-specific causal graphs, ...
research
03/14/2023

Explaining Recommendation System Using Counterfactual Textual Explanations

Currently, there is a significant amount of research being conducted in ...
research
06/13/2023

For Better or Worse: The Impact of Counterfactual Explanations' Directionality on User Behavior in xAI

Counterfactual explanations (CFEs) are a popular approach in explainable...
research
07/28/2023

SAFE: Saliency-Aware Counterfactual Explanations for DNN-based Automated Driving Systems

A CF explainer identifies the minimum modifications in the input that wo...

Please sign up or login with your details

Forgot password? Click here to reset