B-cos Alignment for Inherently Interpretable CNNs and Vision Transformers

06/19/2023
by   Moritz Böhle, et al.
0

We present a new direction for increasing the interpretability of deep neural networks (DNNs) by promoting weight-input alignment during training. For this, we propose to replace the linear transformations in DNNs by our novel B-cos transformation. As we show, a sequence (network) of such transformations induces a single linear transformation that faithfully summarises the full model computations. Moreover, the B-cos transformation is designed such that the weights align with relevant signals during optimisation. As a result, those induced linear transformations become highly interpretable and highlight task-relevant features. Importantly, the B-cos transformation is designed to be compatible with existing architectures and we show that it can easily be integrated into virtually all of the latest state of the art models for computer vision - e.g. ResNets, DenseNets, ConvNext models, as well as Vision Transformers - by combining the B-cos-based explanations with normalisation and attention layers, all whilst maintaining similar accuracy on ImageNet. Finally, we show that the resulting explanations are of high visual quality and perform well under quantitative interpretability metrics.

READ FULL TEXT

page 2

page 10

page 11

page 12

page 15

page 17

page 18

research
05/20/2022

B-cos Networks: Alignment is All We Need for Interpretability

We present a new direction for increasing the interpretability of deep n...
research
06/05/2022

Interpretable Mixture of Experts for Structured Data

With the growth of machine learning for structured data, the need for re...
research
09/27/2021

Optimising for Interpretability: Convolutional Dynamic Alignment Networks

We introduce a new family of neural network models called Convolutional ...
research
01/20/2023

Holistically Explainable Vision Transformers

Transformers increasingly dominate the machine learning landscape across...
research
03/31/2021

Convolutional Dynamic Alignment Networks for Interpretable Classifications

We introduce a new family of neural network models called Convolutional ...
research
02/19/2022

Do Transformers use variable binding?

Increasing the explainability of deep neural networks (DNNs) requires ev...
research
01/29/2023

Towards Verifying the Geometric Robustness of Large-scale Neural Networks

Deep neural networks (DNNs) are known to be vulnerable to adversarial ge...

Please sign up or login with your details

Forgot password? Click here to reset