Doubly Right Object Recognition: A Why Prompt for Visual Rationales

12/12/2022
by   Chengzhi Mao, et al.
0

Many visual recognition models are evaluated only on their classification accuracy, a metric for which they obtain strong performance. In this paper, we investigate whether computer vision models can also provide correct rationales for their predictions. We propose a “doubly right” object recognition benchmark, where the metric requires the model to simultaneously produce both the right labels as well as the right rationales. We find that state-of-the-art visual models, such as CLIP, often provide incorrect rationales for their categorical predictions. However, by transferring the rationales from language models into visual representations through a tailored dataset, we show that we can learn a “why prompt,” which adapts large visual representations to produce correct rationales. Visualizations and empirical experiments show that our prompts significantly improve performance on doubly right object recognition, in addition to zero-shot transfer to unseen tasks and datasets.

READ FULL TEXT

page 4

page 7

research
09/01/2014

ImageNet Large Scale Visual Recognition Challenge

The ImageNet Large Scale Visual Recognition Challenge is a benchmark in ...
research
12/20/2021

Object Recognition as Classification via Visual Properties

We base our work on the teleosemantic modelling of concepts as abilities...
research
07/26/2021

Language Models as Zero-shot Visual Semantic Learners

Visual Semantic Embedding (VSE) models, which map images into a rich sem...
research
06/04/2023

ProTeCt: Prompt Tuning for Hierarchical Consistency

Large visual-language models, like CLIP, learn generalized representatio...
research
04/13/2023

What does CLIP know about a red circle? Visual prompt engineering for VLMs

Large-scale Vision-Language Models, such as CLIP, learn powerful image-t...
research
12/29/2020

Visual Probing and Correction of Object Recognition Models with Interactive user feedback

With the advent of state-of-the-art machine learning and deep learning t...
research
10/17/2014

Learning visual biases from human imagination

Although the human visual system can recognize many concepts under chall...

Please sign up or login with your details

Forgot password? Click here to reset