Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

03/05/2023
by   Atticus Geiger, et al.
0

Causal abstraction is a promising theoretical framework for explainable artificial intelligence that defines when an interpretable high-level causal model is a faithful simplification of a low-level deep learning system. However, existing causal abstraction methods have two major limitations: they require a brute-force search over alignments between the high-level model and the low-level one, and they presuppose that variables in the high-level model will align with disjoint sets of neurons in the low-level one. In this paper, we present distributed alignment search (DAS), which overcomes these limitations. In DAS, we find the alignment between high-level and low-level models using gradient descent rather than conducting a brute-force search, and we allow individual neurons to play multiple distinct roles by analyzing representations in non-standard bases-distributed representations. Our experiments show that DAS can discover internal structure that prior approaches miss. Overall, DAS removes previous obstacles to conducting causal abstraction analyses and allows us to find conceptual structure in trained neural nets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/11/2023

Causal Abstraction for Faithful Model Interpretation

A faithful and interpretable explanation of an AI model's behavior and i...
research
05/15/2023

Interpretability at Scale: Identifying Causal Mechanisms in Alpaca

Obtaining human-interpretable explanations of large, general-purpose lan...
research
10/04/2018

Abstracting Probabilistic Relational Models

Abstraction is a powerful idea widely used in science, to model, reason ...
research
10/24/2022

Learning Latent Structural Causal Models

Causal learning has long concerned itself with the accurate recovery of ...
research
02/19/2021

Abstracting data in distributed ledger systems for higher level analytics and visualizations

By design, distributed ledger technologies persist low-level data which ...
research
12/01/2021

Inducing Causal Structure for Interpretable Neural Networks

In many areas, we have well-founded insights about causal structure that...
research
02/23/2023

Does Deep Learning Learn to Abstract? A Systematic Probing Framework

Abstraction is a desirable capability for deep learning models, which me...

Please sign up or login with your details

Forgot password? Click here to reset