Concept backpropagation: An Explainable AI approach for visualising learned concepts in neural network models

by   Patrik Hammersborg, et al.

Neural network models are widely used in a variety of domains, often as black-box solutions, since they are not directly interpretable for humans. The field of explainable artificial intelligence aims at developing explanation methods to address this challenge, and several approaches have been developed over the recent years, including methods for investigating what type of knowledge these models internalise during the training process. Among these, the method of concept detection, investigates which concepts neural network models learn to represent in order to complete their tasks. In this work, we present an extension to the method of concept detection, named concept backpropagation, which provides a way of analysing how the information representing a given concept is internalised in a given neural network model. In this approach, the model input is perturbed in a manner guided by a trained concept probe for the described model, such that the concept of interest is maximised. This allows for the visualisation of the detected concept directly in the input space of the model, which in turn makes it possible to see what information the model depends on for representing the described concept. We present results for this method applied to a various set of input modalities, and discuss how our proposed method can be used to visualise what information trained concept probes use, and the degree as to which the representation of the probed concept is entangled within the neural network model itself.


page 1

page 4

page 5

page 6


GENIE-NF-AI: Identifying Neurofibromatosis Tumors using Liquid Neural Network (LTC) trained on AACR GENIE Datasets

In recent years, the field of medicine has been increasingly adopting ar...

Information based explanation methods for deep learning agents – with applications on large open-source chess models

With large chess-playing neural network models like AlphaZero contesting...

Opening the TAR Black Box: Developing an Interpretable System for eDiscovery Using the Fuzzy ARTMAP Neural Network

This foundational research provides additional support for using the Fuz...

Interpretable Vertebral Fracture Diagnosis

Do black-box neural network models learn clinically relevant features fo...

A Novel Neural Network Model Specified for Representing Logical Relations

With computers to handle more and more complicated things in variable en...

Using Attribution to Decode Dataset Bias in Neural Network Models for Chemistry

Deep neural networks have achieved state of the art accuracy at classify...

Understanding CNN Hidden Neuron Activations Using Structured Background Knowledge and Deductive Reasoning

A major challenge in Explainable AI is in correctly interpreting activat...

Please sign up or login with your details

Forgot password? Click here to reset