DualCoOp++: Fast and Effective Adaptation to Multi-Label Recognition with Limited Annotations

by   Ping Hu, et al.

Multi-label image recognition in the low-label regime is a task of great challenge and practical significance. Previous works have focused on learning the alignment between textual and visual spaces to compensate for limited image labels, yet may suffer from reduced accuracy due to the scarcity of high-quality multi-label annotations. In this research, we leverage the powerful alignment between textual and visual features pretrained with millions of auxiliary image-text pairs. We introduce an efficient and effective framework called Evidence-guided Dual Context Optimization (DualCoOp++), which serves as a unified approach for addressing partial-label and zero-shot multi-label recognition. In DualCoOp++ we separately encode evidential, positive, and negative contexts for target classes as parametric components of the linguistic input (i.e., prompts). The evidential context aims to discover all the related visual content for the target class, and serves as guidance to aggregate positive and negative contexts from the spatial domain of the image, enabling better distinguishment between similar categories. Additionally, we introduce a Winner-Take-All module that promotes inter-class interaction during training, while avoiding the need for extra parameters and costs. As DualCoOp++ imposes minimal additional learnable overhead on the pretrained vision-language framework, it enables rapid adaptation to multi-label recognition tasks with limited annotations and even unseen classes. Experiments on standard multi-label recognition benchmarks across two challenging low-label settings demonstrate the superior performance of our approach compared to state-of-the-art methods.


page 1

page 4

page 5

page 10


DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations

Solving multi-label recognition (MLR) for images in the low-label regime...

A Dual Modality Approach For (Zero-Shot) Multi-Label Classification

In computer vision, multi-label classification, including zero-shot mult...

General Multi-label Image Classification with Transformers

Multi-label image classification is the task of predicting a set of labe...

Discriminative Region-based Multi-Label Zero-Shot Learning

Multi-label zero-shot learning (ZSL) is a more realistic counter-part of...

Texts as Images in Prompt Tuning for Multi-Label Image Recognition

Prompt tuning has been employed as an efficient way to adapt large visio...

Residual Attention: A Simple but Effective Method for Multi-Label Recognition

Multi-label image recognition is a challenging computer vision task of p...

Pose Guided Attention for Multi-label Fashion Image Classification

We propose a compact framework with guided attention for multi-label cla...

Please sign up or login with your details

Forgot password? Click here to reset