CLIP-based Synergistic Knowledge Transfer for Text-based Person Retrieval

09/18/2023
by   Yating Liu, et al.
0

Text-based Person Retrieval aims to retrieve the target person images given a textual query. The primary challenge lies in bridging the substantial gap between vision and language modalities, especially when dealing with limited large-scale datasets. In this paper, we introduce a CLIP-based Synergistic Knowledge Transfer(CSKT) approach for TBPR. Specifically, to explore the CLIP's knowledge on input side, we first propose a Bidirectional Prompts Transferring (BPT) module constructed by text-to-image and image-to-text bidirectional prompts and coupling projections. Secondly, Dual Adapters Transferring (DAT) is designed to transfer knowledge on output side of Multi-Head Self-Attention (MHSA) in vision and language. This synergistic two-way collaborative mechanism promotes the early-stage feature fusion and efficiently exploits the existing knowledge of CLIP. CSKT outperforms the state-of-the-art approaches across three benchmark datasets when the training parameters merely account for 7.4 of the entire model, demonstrating its remarkable efficiency, effectiveness and generalization.

READ FULL TEXT
research
10/20/2021

Text-Based Person Search with Limited Data

Text-based person search (TBPS) aims at retrieving a target person from ...
research
07/18/2023

Unleashing the Imagination of Text: A Novel Framework for Text-to-image Person Retrieval via Exploring the Power of Words

The goal of Text-to-image person retrieval is to retrieve person images ...
research
09/09/2023

BiLMa: Bidirectional Local-Matching for Text-based Person Re-identification

Text-based person re-identification (TBPReID) aims to retrieve person im...
research
05/11/2023

EAML: Ensemble Self-Attention-based Mutual Learning Network for Document Image Classification

In the recent past, complex deep neural networks have received huge inte...
research
07/29/2023

Instance-Wise Adaptive Tuning and Caching for Vision-Language Models

Large-scale vision-language models (LVLMs) pretrained on massive image-t...
research
03/01/2023

The style transformer with common knowledge optimization for image-text retrieval

Image-text retrieval which associates different modalities has drawn bro...
research
05/06/2021

Person Retrieval in Surveillance Using Textual Query: A Review

Recent advancement of research in biometrics, computer vision, and natur...

Please sign up or login with your details

Forgot password? Click here to reset