Interpretability is in the Mind of the Beholder: A Causal Framework for Human-interpretable Representation Learning

09/14/2023
by   Emanuele Marconato, et al.
0

Focus in Explainable AI is shifting from explanations defined in terms of low-level elements, such as input features, to explanations encoded in terms of interpretable concepts learned from data. How to reliably acquire such concepts is, however, still fundamentally unclear. An agreed-upon notion of concept interpretability is missing, with the result that concepts used by both post-hoc explainers and concept-based neural networks are acquired through a variety of mutually incompatible strategies. Critically, most of these neglect the human side of the problem: a representation is understandable only insofar as it can be understood by the human at the receiving end. The key challenge in Human-interpretable Representation Learning (HRL) is how to model and operationalize this human element. In this work, we propose a mathematical framework for acquiring interpretable representations suitable for both post-hoc explainers and concept-based neural networks. Our formalization of HRL builds on recent advances in causal representation learning and explicitly models a human stakeholder as an external observer. This allows us to derive a principled notion of alignment between the machine representation and the vocabulary of concepts understood by the human. In doing so, we link alignment and interpretability through a simple and intuitive name transfer game, and clarify the relationship between alignment and a well-known property of representations, namely disentanglment. We also show that alignment is linked to the issue of undesirable correlations among concepts, also known as concept leakage, and to content-style separation, all through a general information-theoretic reformulation of these properties. Our conceptualization aims to bridge the gap between the human and algorithmic sides of interpretability and establish a stepping stone for new research on human-interpretable representations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2022

GlanceNets: Interpretabile, Leak-proof Concept-based Models

There is growing interest in concept-based models (CBMs) that combine hi...
research
02/07/2019

Automating Interpretability: Discovering and Testing Visual Concepts Learned by Neural Networks

Interpretability has become an important topic of research as more machi...
research
10/23/2020

Model Interpretability through the Lens of Computational Complexity

In spite of several claims stating that some models are more interpretab...
research
11/17/2021

Acquisition of Chess Knowledge in AlphaZero

What is learned by sophisticated neural network agents such as AlphaZero...
research
03/19/2023

Unsupervised Interpretable Basis Extraction for Concept-Based Visual Explanations

An important line of research attempts to explain CNN image classifier p...
research
06/09/2017

TIP: Typifying the Interpretability of Procedures

We provide a novel notion of what it means to be interpretable, looking ...
research
08/21/2023

Sparse Linear Concept Discovery Models

The recent mass adoption of DNNs, even in safety-critical scenarios, has...

Please sign up or login with your details

Forgot password? Click here to reset