Learning Granularity-Unified Representations for Text-to-Image Person Re-identification

07/16/2022
by   Zhiyin Shao, et al.
0

Text-to-image person re-identification (ReID) aims to search for pedestrian images of an interested identity via textual descriptions. It is challenging due to both rich intra-modal variations and significant inter-modal gaps. Existing works usually ignore the difference in feature granularity between the two modalities, i.e., the visual features are usually fine-grained while textual features are coarse, which is mainly responsible for the large inter-modal gaps. In this paper, we propose an end-to-end framework based on transformers to learn granularity-unified representations for both modalities, denoted as LGUR. LGUR framework contains two modules: a Dictionary-based Granularity Alignment (DGA) module and a Prototype-based Granularity Unification (PGU) module. In DGA, in order to align the granularities of two modalities, we introduce a Multi-modality Shared Dictionary (MSD) to reconstruct both visual and textual features. Besides, DGA has two important factors, i.e., the cross-modality guidance and the foreground-centric reconstruction, to facilitate the optimization of MSD. In PGU, we adopt a set of shared and learnable prototypes as the queries to extract diverse and semantically aligned features for both modalities in the granularity-unified feature space, which further promotes the ReID performance. Comprehensive experiments show that our LGUR consistently outperforms state-of-the-arts by large margins on both CUHK-PEDES and ICFG-PEDES datasets. Code will be released at https://github.com/ZhiyinShao-H/LGUR.

READ FULL TEXT

page 1

page 3

page 8

research
12/01/2022

Learning Progressive Modality-shared Transformers for Effective Visible-Infrared Person Re-identification

Visible-Infrared Person Re-Identification (VI-ReID) is a challenging ret...
research
07/27/2021

Semantically Self-Aligned Network for Text-to-Image Part-aware Person Re-identification

Text-to-image person re-identification (ReID) aims to search for images ...
research
12/13/2021

Learning Semantic-Aligned Feature Representation for Text-based Person Search

Text-based person search aims to retrieve images of a certain pedestrian...
research
01/19/2021

AXM-Net: Cross-Modal Context Sharing Attention Network for Person Re-ID

Cross-modal person re-identification (Re-ID) is critical for modern vide...
research
11/15/2017

Dual-Path Convolutional Image-Text Embedding with Instance Loss

Matching images and sentences demands a fine understanding of both modal...
research
03/15/2023

Mining False Positive Examples for Text-Based Person Re-identification

Text-based person re-identification (ReID) aims to identify images of th...
research
06/09/2022

Cross-modal Local Shortest Path and Global Enhancement for Visible-Thermal Person Re-Identification

In addition to considering the recognition difficulty caused by human po...

Please sign up or login with your details

Forgot password? Click here to reset