Learning to Name Classes for Vision and Language Models

04/04/2023
by   Sarah Parisot, et al.
0

Large scale vision and language models can achieve impressive zero-shot recognition performance by mapping class specific text queries to image content. Two distinct challenges that remain however, are high sensitivity to the choice of handcrafted class names that define queries, and the difficulty of adaptation to new, smaller datasets. Towards addressing these problems, we propose to leverage available data to learn, for each class, an optimal word embedding as a function of the visual content. By learning new word embeddings on an otherwise frozen model, we are able to retain zero-shot capabilities for new classes, easily adapt models to new datasets, and adjust potentially erroneous, non-descriptive or ambiguous class names. We show that our solution can easily be integrated in image classification and object detection pipelines, yields significant performance gains in multiple scenarios and provides insights into model biases and labelling errors.

READ FULL TEXT

page 1

page 12

page 14

page 15

page 16

research
09/12/2022

VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models

Vision-language models trained on large, randomly collected data had sig...
research
01/26/2023

Vision-Language Models Performing Zero-Shot Tasks Exhibit Gender-based Disparities

We explore the extent to which zero-shot vision-language models exhibit ...
research
04/05/2023

What's in a Name? Beyond Class Indices for Image Recognition

Existing machine learning models demonstrate excellent performance in im...
research
08/13/2023

MDB: Interactively Querying Datasets and Models

As models are trained and deployed, developers need to be able to system...
research
09/10/2023

Mitigating Word Bias in Zero-shot Prompt-based Classifiers

Prompt-based classifiers are an attractive approach for zero-shot classi...
research
03/17/2021

Large-Scale Zero-Shot Image Classification from Rich and Diverse Textual Descriptions

We study the impact of using rich and diverse textual descriptions of cl...
research
06/24/2023

DesCo: Learning Object Recognition with Rich Language Descriptions

Recent development in vision-language approaches has instigated a paradi...

Please sign up or login with your details

Forgot password? Click here to reset