CLIP-GCD: Simple Language Guided Generalized Category Discovery

05/17/2023
by   Rabah Ouldnoughi, et al.
0

Generalized Category Discovery (GCD) requires a model to both classify known categories and cluster unknown categories in unlabeled data. Prior methods leveraged self-supervised pre-training combined with supervised fine-tuning on the labeled data, followed by simple clustering methods. In this paper, we posit that such methods are still prone to poor performance on out-of-distribution categories, and do not leverage a key ingredient: Semantic relationships between object categories. We therefore propose to leverage multi-modal (vision and language) models, in two complementary ways. First, we establish a strong baseline by replacing uni-modal features with CLIP, inspired by its zero-shot performance. Second, we propose a novel retrieval-based mechanism that leverages CLIP's aligned vision-language representations by mining text descriptions from a text corpus for the labeled and unlabeled set. We specifically use the alignment between CLIP's visual encoding of the image and textual encoding of the corpus to retrieve top-k relevant pieces of text and incorporate their embeddings to perform joint image+text semi-supervised clustering. We perform rigorous experimentation and ablations (including on where to retrieve from, how much to retrieve, and how to combine information), and validate our results on several datasets including out-of-distribution domains, demonstrating state-of-art results.

READ FULL TEXT
research
03/30/2023

Dynamic Conceptional Contrastive Learning for Generalized Category Discovery

Generalized category discovery (GCD) is a recently proposed open-world p...
research
05/29/2023

LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections

Recently, large-scale pre-trained Vision and Language (VL) models have s...
research
02/07/2020

Snippext: Semi-supervised Opinion Mining with Augmented Data

Online services are interested in solutions to opinion mining, which is ...
research
06/02/2023

Enhancing CLIP with CLIP: Exploring Pseudolabeling for Limited-Label Prompt Tuning

Fine-tuning vision-language models (VLMs) like CLIP to downstream tasks ...
research
09/21/2023

Bridging the Gap: Learning Pace Synchronization for Open-World Semi-Supervised Learning

In open-world semi-supervised learning, a machine learning model is task...
research
04/16/2023

USNID: A Framework for Unsupervised and Semi-supervised New Intent Discovery

New intent discovery is of great value to natural language processing, a...
research
08/22/2023

Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models

Pre-trained vision-language models, e.g., CLIP, working with manually de...

Please sign up or login with your details

Forgot password? Click here to reset