Learned Visual Features to Textual Explanations

09/01/2023
by   Saeid Asgari Taghanaki, et al.
0

Interpreting the learned features of vision models has posed a longstanding challenge in the field of machine learning. To address this issue, we propose a novel method that leverages the capabilities of large language models (LLMs) to interpret the learned features of pre-trained image classifiers. Our method, called TExplain, tackles this task by training a neural network to establish a connection between the feature space of image classifiers and LLMs. Then, during inference, our approach generates a vast number of sentences to explain the features learned by the classifier for a given image. These sentences are then used to extract the most frequent words, providing a comprehensive understanding of the learned features and patterns within the classifier. Our method, for the first time, utilizes these frequent words corresponding to a visual representation to provide insights into the decision-making process of the independently trained classifier, enabling the detection of spurious correlations, biases, and a deeper comprehension of its behavior. To validate the effectiveness of our approach, we conduct experiments on diverse datasets, including ImageNet-9L and Waterbirds. The results demonstrate the potential of our method to enhance the interpretability and robustness of image classifiers.

READ FULL TEXT

page 6

page 8

research
05/16/2022

Sparse Visual Counterfactual Explanations in Image Space

Visual counterfactual explanations (VCEs) in image space are an importan...
research
09/04/2023

DeViL: Decoding Vision features into Language

Post-hoc explanation methods have often been criticised for abstracting ...
research
07/24/2020

IDS at SemEval-2020 Task 10: Does Pre-trained Language Model Know What to Emphasize?

We propose a novel method that enables us to determine words that deserv...
research
01/16/2023

Img2Tab: Automatic Class Relevant Concept Discovery from StyleGAN Features for Explainable Image Classification

Traditional tabular classifiers provide explainable decision-making with...
research
10/09/2022

Learning to Decompose Visual Features with Latent Textual Prompts

Recent advances in pre-training vision-language models like CLIP have sh...
research
04/07/2021

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

We study joint learning of Convolutional Neural Network (CNN) and Transf...
research
08/14/2019

Visualizing Image Content to Explain Novel Image Discovery

The initial analysis of any large data set can be divided into two phase...

Please sign up or login with your details

Forgot password? Click here to reset