DeepAI AI Chat
Log In Sign Up

On Concept-Based Explanations in Deep Neural Networks

by   Chih-Kuan Yeh, et al.

Deep neural networks (DNNs) build high-level intelligence on low-level raw features. Understanding of this high-level intelligence can be enabled by deciphering the concepts they base their decisions on, as human-level thinking. In this paper, we study concept-based explainability for DNNs in a systematic framework. First, we define the notion of completeness, which quantifies how sufficient a particular set of concepts is in explaining a model's prediction behavior. Based on performance and variability motivations, we propose two definitions to quantify completeness. We show that under degenerate conditions, our method is equivalent to Principal Component Analysis. Next, we propose a concept discovery method that considers two additional constraints to encourage the interpretability of the discovered concepts. We use game-theoretic notions to aggregate over sets to define an importance score for each discovered concept, which we call ConceptSHAP. On specifically-designed synthetic datasets and real-world text and image datasets, we validate the effectiveness of our framework in finding concepts that are complete in explaining the decision, and interpretable.


page 10

page 18

page 19


Concept-Based Explanations for Tabular Data

The interpretability of machine learning models has been an essential ar...

Concept-based Explanations for Out-Of-Distribution Detectors

Out-of-distribution (OOD) detection plays a crucial role in ensuring the...

Explaining Deep Neural Networks using Unsupervised Clustering

We propose a novel method to explain trained deep neural networks (DNNs)...

Cause and Effect: Concept-based Explanation of Neural Networks

In many scenarios, human decisions are explained based on some high-leve...

Multi-dimensional concept discovery (MCD): A unifying framework with completeness guarantees

The completeness axiom renders the explanation of a post-hoc XAI method ...

Hierarchical Semantic Tree Concept Whitening for Interpretable Image Classification

With the popularity of deep neural networks (DNNs), model interpretabili...

Provable concept learning for interpretable predictions using variational inference

In safety critical applications, practitioners are reluctant to trust ne...

Code Repositories


PyTorch Transformer-based Language Model Implementation of ConceptSHAP

view repo