Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge Transfer

07/05/2022
by   Sunan He, et al.
0

Real-world recognition system often encounters a plenty of unseen labels in practice. To identify such unseen labels, multi-label zero-shot learning (ML-ZSL) focuses on transferring knowledge by a pre-trained textual label embedding (e.g., GloVe). However, such methods only exploit singlemodal knowledge from a language model, while ignoring the rich semantic information inherent in image-text pairs. Instead, recently developed open-vocabulary (OV) based methods succeed in exploiting such information of image-text pairs in object detection, and achieve impressive performance. Inspired by the success of OV-based methods, we propose a novel open-vocabulary framework, named multimodal knowledge transfer (MKT), for multi-label classification. Specifically, our method exploits multi-modal knowledge of image-text pairs based on a vision and language pretraining (VLP) model. To facilitate transferring the imagetext matching ability of VLP model, knowledge distillation is used to guarantee the consistency of image and label embeddings, along with prompt tuning to further update the label embeddings. To further recognize multiple objects, a simple but effective two-stream module is developed to capture both local and global features. Extensive experimental results show that our method significantly outperforms state-of-theart methods on public benchmark datasets. Code will be available at https://github.com/seanhe97/MKT.

READ FULL TEXT

page 1

page 3

page 8

page 13

research
11/29/2022

Language-driven Open-Vocabulary 3D Scene Understanding

Open-vocabulary scene understanding aims to localize and recognize unsee...
research
08/12/2023

Multi-Label Knowledge Distillation

Existing knowledge distillation methods typically work by imparting the ...
research
04/12/2023

CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks

Contrastive Language-Image Pre-training (CLIP) is a powerful multimodal ...
research
11/17/2017

Multi-Label Zero-Shot Learning with Structured Knowledge Graphs

In this paper, we propose a novel deep learning architecture for multi-l...
research
11/24/2022

Delving into Out-of-Distribution Detection with Vision-Language Representations

Recognizing out-of-distribution (OOD) samples is critical for machine le...
research
07/18/2023

PatchCT: Aligning Patch Set and Label Set with Conditional Transport for Multi-Label Image Classification

Multi-label image classification is a prediction task that aims to ident...
research
08/16/2023

Knowledge-Enhanced Multi-Label Few-Shot Product Attribute-Value Extraction

Existing attribute-value extraction (AVE) models require large quantitie...

Please sign up or login with your details

Forgot password? Click here to reset