Learning to Intervene on Concept Bottlenecks

08/25/2023
by   David Steinmann, et al.
0

While traditional deep learning models often lack interpretability, concept bottleneck models (CBMs) provide inherent explanations via their concept representations. Specifically, they allow users to perform interventional interactions on these concepts by updating the concept values and thus correcting the predictive output of the model. Traditionally, however, these interventions are applied to the model only once and discarded afterward. To rectify this, we present concept bottleneck memory models (CB2M), an extension to CBMs. Specifically, a CB2M learns to generalize interventions to appropriate novel situations via a two-fold memory with which it can learn to detect mistakes and to reapply previous interventions. In this way, a CB2M learns to automatically improve model performance from a few initially obtained interventions. If no prior human interventions are available, a CB2M can detect potential mistakes of the CBM bottleneck and request targeted interventions. In our experimental evaluations on challenging scenarios like handling distribution shifts and confounded training data, we illustrate that CB2M are able to successfully generalize interventions to unseen data and can indeed identify wrongly inferred concepts. Overall, our results show that CB2M is a great tool for users to provide interactive feedback on CBMs, e.g., by guiding a user's interaction and requiring fewer interventions.

READ FULL TEXT
research
09/19/2022

Concept Embedding Models

Deploying AI-powered systems requires trustworthy models supporting effe...
research
05/31/2022

Post-hoc Concept Bottleneck Models

Concept Bottleneck Models (CBMs) map the inputs onto a set of interpreta...
research
11/19/2022

I saw, I conceived, I concluded: Progressive Concepts as Bottlenecks

Concept bottleneck models (CBMs) include a bottleneck of human-interpret...
research
01/25/2023

Towards Robust Metrics for Concept Representation Evaluation

Recent work on interpretability has focused on concept-based explanation...
research
11/21/2022

Learn to explain yourself, when you can: Equipping Concept Bottleneck Models with the ability to abstain on their concept predictions

The Concept Bottleneck Models (CBMs) of Koh et al. [2020] provide a mean...
research
04/07/2022

GreaseVision: Rewriting the Rules of the Interface

Digital harms can manifest across any interface. Key problems in address...
research
11/07/2022

Towards learning to explain with concept bottleneck models: mitigating information leakage

Concept bottleneck models perform classification by first predicting whi...

Please sign up or login with your details

Forgot password? Click here to reset