Causal Abstraction for Faithful Model Interpretation

01/11/2023
by   Atticus Geiger, et al.
0

A faithful and interpretable explanation of an AI model's behavior and internal structure is a high-level explanation that is human-intelligible but also consistent with the known, but often opaque low-level causal details of the model. We argue that the theory of causal abstraction provides the mathematical foundations for the desired kinds of model explanations. In causal abstraction analysis, we use interventions on model-internal states to rigorously assess whether an interpretable high-level causal model is a faithful description of an AI model. Our contributions in this area are: (1) We generalize causal abstraction to cyclic causal structures and typed high-level variables. (2) We show how multi-source interchange interventions can be used to conduct causal abstraction analyses. (3) We define a notion of approximate causal abstraction that allows us to assess the degree to which a high-level causal model is a causal abstraction of a lower-level one. (4) We prove constructive causal abstraction can be decomposed into three operations we refer to as marginalization, variable-merge, and value-merge. (5) We formalize the XAI methods of LIME, causal effect estimation, causal mediation analysis, iterated nullspace projection, and circuit-based explanations as special cases of causal abstraction analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/05/2023

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

Causal abstraction is a promising theoretical framework for explainable ...
research
12/10/2018

Abstracting Causal Models

We consider a sequence of successively more restrictive definitions of a...
research
11/22/2022

Causal Abstraction with Soft Interventions

Causal abstraction provides a theory describing how several causal model...
research
08/21/2021

Learning Causal Models of Autonomous Agents using Interventions

One of the several obstacles in the widespread use of AI systems is the ...
research
06/27/2019

Approximate Causal Abstraction

Scientific models describe natural phenomena at different levels of abst...
research
10/04/2018

Abstracting Probabilistic Relational Models

Abstraction is a powerful idea widely used in science, to model, reason ...
research
05/15/2023

Interpretability at Scale: Identifying Causal Mechanisms in Alpaca

Obtaining human-interpretable explanations of large, general-purpose lan...

Please sign up or login with your details

Forgot password? Click here to reset