What's in a Name? Beyond Class Indices for Image Recognition

04/05/2023
by   Kai Han, et al.
0

Existing machine learning models demonstrate excellent performance in image object recognition after training on a large-scale dataset under full supervision. However, these models only learn to map an image to a predefined class index, without revealing the actual semantic meaning of the object in the image. In contrast, vision-language models like CLIP are able to assign semantic class names to unseen objects in a `zero-shot' manner, although they still rely on a predefined set of candidate names at test time. In this paper, we reconsider the recognition problem and task a vision-language model to assign class names to images given only a large and essentially unconstrained vocabulary of categories as prior information. We use non-parametric methods to establish relationships between images which allow the model to automatically narrow down the set of possible candidate names. Specifically, we propose iteratively clustering the data and voting on class names within them, showing that this enables a roughly 50% improvement over the baseline on ImageNet. Furthermore, we tackle this problem both in unsupervised and partially supervised settings, as well as with a coarse-grained and fine-grained search space as the unconstrained dictionary.

READ FULL TEXT

page 2

page 8

research
06/24/2023

DesCo: Learning Object Recognition with Rich Language Descriptions

Recent development in vision-language approaches has instigated a paradi...
research
04/04/2023

Learning to Name Classes for Vision and Language Models

Large scale vision and language models can achieve impressive zero-shot ...
research
10/18/2022

Perceptual Grouping in Vision-Language Models

Recent advances in zero-shot image recognition suggest that vision-langu...
research
04/03/2023

AutoLabel: CLIP-based framework for Open-set Video Domain Adaptation

Open-set Unsupervised Video Domain Adaptation (OUVDA) deals with the tas...
research
09/21/2023

SLHCat: Mapping Wikipedia Categories and Lists to DBpedia by Leveraging Semantic, Lexical, and Hierarchical Features

Wikipedia articles are hierarchically organized through categories and l...
research
04/24/2016

Semi-supervised Vocabulary-informed Learning

Despite significant progress in object categorization, in recent years, ...
research
08/06/2020

Webly Supervised Semantic Embeddings for Large Scale Zero-Shot Learning

Zero-shot learning (ZSL) makes object recognition in images possible in ...

Please sign up or login with your details

Forgot password? Click here to reset