EDUCE: Explaining model Decisions through Unsupervised Concepts Extraction

05/28/2019
by   Diane Bouchacourt, et al.
1

With the advent of deep neural networks, some research focuses towards understanding their black-box behavior. In this paper, we propose a new type of self-interpretable models, that are, architectures designed to provide explanations along with their predictions. Our method proceeds in two stages and is trained end-to-end: first, our model builds a low-dimensional binary representation of any input where each feature denotes the presence or absence of concepts. Then, it computes a prediction only based on this binary representation through a simple linear model. This allows an easy interpretation of the model's output in terms of presence of particular concepts in the input. The originality of our approach lies in the fact that concepts are automatically discovered at training time, without the need for additional supervision. Concepts correspond to a set of patterns, built on local low-level features (e.g a part of an image, a word in a sentence), easily identifiable from the other concepts. We experimentally demonstrate the relevance of our approach using classification tasks on two types of data, text and image, by showing its predictive performance and interpretability.

READ FULL TEXT

page 7

page 8

page 19

research
05/11/2021

Rationalization through Concepts

Automated predictions require explanations to be interpretable by humans...
research
08/16/2020

Towards Faithful and Meaningful Interpretable Representations

Interpretable representations are the backbone of many black-box explain...
research
04/20/2023

Learning Bottleneck Concepts in Image Classification

Interpreting and explaining the behavior of deep neural networks is crit...
research
12/17/2021

Expedition: A System for the Unsupervised Learning of a Hierarchy of Concepts

We present a system for bottom-up cumulative learning of myriad concepts...
research
11/03/2020

MACE: Model Agnostic Concept Extractor for Explaining Image Classification Networks

Deep convolutional networks have been quite successful at various image ...
research
09/25/2017

The Consciousness Prior

A new prior is proposed for representation learning, which can be combin...
research
04/27/2020

A Disentangling Invertible Interpretation Network for Explaining Latent Representations

Neural networks have greatly boosted performance in computer vision by l...

Please sign up or login with your details

Forgot password? Click here to reset