AXM-Net: Cross-Modal Context Sharing Attention Network for Person Re-ID

01/19/2021
by   ammarah-farooq, et al.
0

Cross-modal person re-identification (Re-ID) is critical for modern video surveillance systems. The key challenge is to align inter-modality representations according to semantic information present for a person and ignore background information. In this work, we present AXM-Net, a novel CNN based architecture designed for learning semantically aligned visual and textual representations. The underlying building block consists of multiple streams of feature maps coming from visual and textual modalities and a novel learnable context sharing semantic alignment network. We also propose complementary intra modal attention learning mechanisms to focus on more fine-grained local details in the features along with a cross-modal affinity loss for robust feature matching. Our design is unique in its ability to implicitly learn feature alignments from data. The entire AXM-Net can be trained in an end-to-end manner. We report results on both person search and cross-modal Re-ID tasks. Extensive experimentation validates the proposed framework and demonstrates its superiority by outperforming the current state-of-the-art methods by a significant margin.

READ FULL TEXT

page 1

page 4

page 8

research
09/20/2020

Dual-path CNN with Max Gated block for Text-Based Person Re-identification

Text-based person re-identification(Re-id) is an important task in video...
research
06/23/2019

Improving Description-based Person Re-identification by Multi-granularity Image-text Alignments

Description-based person re-identification (Re-id) is an important task ...
research
08/17/2021

Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences

We address the problem of visible-infrared person re-identification (VI-...
research
02/27/2022

DXM-TransFuse U-net: Dual Cross-Modal Transformer Fusion U-net for Automated Nerve Identification

Accurate nerve identification is critical during surgical procedures for...
research
07/16/2022

Learning Granularity-Unified Representations for Text-to-Image Person Re-identification

Text-to-image person re-identification (ReID) aims to search for pedestr...
research
08/04/2022

Learning Modal-Invariant and Temporal-Memory for Video-based Visible-Infrared Person Re-Identification

Thanks for the cross-modal retrieval techniques, visible-infrared (RGB-I...
research
06/09/2022

Cross-modal Local Shortest Path and Global Enhancement for Visible-Thermal Person Re-Identification

In addition to considering the recognition difficulty caused by human po...

Please sign up or login with your details

Forgot password? Click here to reset