Learning Robust Visual-Semantic Embedding for Generalizable Person Re-identification

04/19/2023
by   Suncheng Xiang, et al.
0

Generalizable person re-identification (Re-ID) is a very hot research topic in machine learning and computer vision, which plays a significant role in realistic scenarios due to its various applications in public security and video surveillance. However, previous methods mainly focus on the visual representation learning, while neglect to explore the potential of semantic features during training, which easily leads to poor generalization capability when adapted to the new domain. In this paper, we propose a Multi-Modal Equivalent Transformer called MMET for more robust visual-semantic embedding learning on visual, textual and visual-textual tasks respectively. To further enhance the robust feature learning in the context of transformer, a dynamic masking mechanism called Masked Multimodal Modeling strategy (MMM) is introduced to mask both the image patches and the text tokens, which can jointly works on multimodal or unimodal data and significantly boost the performance of generalizable person Re-ID. Extensive experiments on benchmark datasets demonstrate the competitive performance of our method over previous approaches. We hope this method could advance the research towards visual-semantic representation learning. Our source code is also publicly available at https://github.com/JeremyXSC/MMET.

READ FULL TEXT

page 8

page 13

research
11/02/2022

Deep Multimodal Fusion for Generalizable Person Re-identification

Person re-identification plays a significant role in realistic scenarios...
research
01/05/2023

Learning Feature Recovery Transformer for Occluded Person Re-identification

One major issue that challenges person re-identification (Re-ID) is the ...
research
07/20/2023

Learning Discriminative Visual-Text Representation for Polyp Re-Identification

Colonoscopic Polyp Re-Identification aims to match a specific polyp in a...
research
08/07/2023

Part-Aware Transformer for Generalizable Person Re-identification

Domain generalization person re-identification (DG-ReID) aims to train a...
research
07/13/2021

HAT: Hierarchical Aggregation Transformers for Person Re-identification

Recently, with the advance of deep Convolutional Neural Networks (CNNs),...
research
08/11/2020

Unified Representation Learning for Cross Model Compatibility

We propose a unified representation learning framework to address the Cr...
research
07/05/2022

Multi-modal Robustness Analysis Against Language and Visual Perturbations

Joint visual and language modeling on large-scale datasets has recently ...

Please sign up or login with your details

Forgot password? Click here to reset