Exploiting the Textual Potential from Vision-Language Pre-training for Text-based Person Search

03/08/2023
by   Guanshuo Wang, et al.
0

Text-based Person Search (TPS), is targeted on retrieving pedestrians to match text descriptions instead of query images. Recent Vision-Language Pre-training (VLP) models can bring transferable knowledge to downstream TPS tasks, resulting in more efficient performance gains. However, existing TPS methods improved by VLP only utilize pre-trained visual encoders, neglecting the corresponding textual representation and breaking the significant modality alignment learned from large-scale pre-training. In this paper, we explore the full utilization of textual potential from VLP in TPS tasks. We build on the proposed VLP-TPS baseline model, which is the first TPS model with both pre-trained modalities. We propose the Multi-Integrity Description Constraints (MIDC) to enhance the robustness of the textual modality by incorporating different components of fine-grained corpus during training. Inspired by the prompt approach for zero-shot classification with VLP models, we propose the Dynamic Attribute Prompt (DAP) to provide a unified corpus of fine-grained attributes as language hints for the image modality. Extensive experiments show that our proposed TPS framework achieves state-of-the-art performance, exceeding the previous best method by a margin.

READ FULL TEXT

page 3

page 8

research
11/09/2021

FILIP: Fine-grained Interactive Language-Image Pre-Training

Unsupervised large-scale vision-language pre-training has shown promisin...
research
05/15/2023

PLIP: Language-Image Pre-training for Person Representation Learning

Pre-training has emerged as an effective technique for learning powerful...
research
09/04/2023

Unified Pre-training with Pseudo Texts for Text-To-Image Person Re-identification

The pre-training task is indispensable for the text-to-image person re-i...
research
07/04/2023

LPN: Language-guided Prototypical Network for few-shot classification

Few-shot classification aims to adapt to new tasks with limited labeled ...
research
02/10/2023

Towards Text-based Human Search and Approach with an Intelligent Robot Dog

In this paper, we propose a SOCratic model for Robots Approaching humans...
research
06/05/2023

Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark

In this paper, we introduce a large Multi-Attribute and Language Search ...
research
08/31/2023

ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation

Vision-language pre-training (VLP) methods are blossoming recently, and ...

Please sign up or login with your details

Forgot password? Click here to reset