Texts as Images in Prompt Tuning for Multi-Label Image Recognition

11/23/2022
by   Zixian Guo, et al.
0

Prompt tuning has been employed as an efficient way to adapt large vision-language pre-trained models (e.g. CLIP) to various downstream tasks in data-limited or label-limited settings. Nonetheless, visual data (e.g., images) is by default prerequisite for learning prompts in existing methods. In this work, we advocate that the effectiveness of image-text contrastive learning in aligning the two modalities (for training CLIP) further makes it feasible to treat texts as images for prompt tuning and introduce TaI prompting. In contrast to the visual data, text descriptions are easy to collect, and their class labels can be directly derived. Particularly, we apply TaI prompting to multi-label image recognition, where sentences in the wild serve as alternatives to images for prompt tuning. Moreover, with TaI, double-grained prompt tuning (TaI-DPT) is further presented to extract both coarse-grained and fine-grained embeddings for enhancing the multi-label recognition performance. Experimental results show that our proposed TaI-DPT outperforms zero-shot CLIP by a large margin on multiple benchmarks, e.g., MS-COCO, VOC2007, and NUS-WIDE, while it can be combined with existing methods of prompting from images to improve recognition performance further. Code is released at https://github.com/guozix/TaI-DPT.

READ FULL TEXT

page 3

page 5

research
01/27/2021

Generative Multi-Label Zero-Shot Learning

Multi-label zero-shot learning strives to classify images into multiple ...
research
10/07/2020

Multi-label classification of promotions in digital leaflets using textual and visual information

Product descriptions in e-commerce platforms contain detailed and valuab...
research
07/15/2023

Semantic Contrastive Bootstrapping for Single-positive Multi-label Recognition

Learning multi-label image recognition with incomplete annotation is gai...
research
01/10/2022

GUDN A novel guide network for extreme multi-label text classification

The problem of extreme multi-label text classification (XMTC) is to reca...
research
07/12/2022

IDEA: Increasing Text Diversity via Online Multi-Label Recognition for Vision-Language Pre-training

Vision-Language Pre-training (VLP) with large-scale image-text pairs has...
research
06/20/2022

DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations

Solving multi-label recognition (MLR) for images in the low-label regime...
research
08/03/2023

DualCoOp++: Fast and Effective Adaptation to Multi-Label Recognition with Limited Annotations

Multi-label image recognition in the low-label regime is a task of great...

Please sign up or login with your details

Forgot password? Click here to reset