Video-based Person Re-identification without Bells and Whistles

05/22/2021
by   Chih-Ting Liu, et al.
8

Video-based person re-identification (Re-ID) aims at matching the video tracklets with cropped video frames for identifying the pedestrians under different cameras. However, there exists severe spatial and temporal misalignment for those cropped tracklets due to the imperfect detection and tracking results generated with obsolete methods. To address this issue, we present a simple re-Detect and Link (DL) module which can effectively reduce those unexpected noise through applying the deep learning-based detection and tracking on the cropped tracklets. Furthermore, we introduce an improved model called Coarse-to-Fine Axial-Attention Network (CF-AAN). Based on the typical Non-local Network, we replace the non-local module with three 1-D position-sensitive axial attentions, in addition to our proposed coarse-to-fine structure. With the developed CF-AAN, compared to the original non-local operation, we can not only significantly reduce the computation cost but also obtain the state-of-the-art performance (91.3 the large-scale MARS dataset. Meanwhile, by simply adopting our DL module for data alignment, to our surprise, several baseline models can achieve better or comparable results with the current state-of-the-arts. Besides, we discover the errors not only for the identity labels of tracklets but also for the evaluation protocol for the test data of MARS. We hope that our work can help the community for the further development of invariant representation without the hassle of the spatial and temporal alignment and dataset noise. The code, corrected labels, evaluation protocol, and the aligned data will be available at https://github.com/jackie840129/CF-AAN.

READ FULL TEXT

page 1

page 4

page 6

page 8

research
08/05/2019

Spatially and Temporally Efficient Non-local Attention Network for Video-based Person Re-Identification

Video-based person re-identification (Re-ID) aims at matching video sequ...
research
07/12/2018

Video-based Person Re-identification via 3D Convolutional Networks and Non-local Attention

Video-based person re-identification (ReID) is a challenging problem, wh...
research
04/30/2021

BiCnet-TKS: Learning Efficient Spatial-Temporal Representation for Video Person Re-Identification

In this paper, we present an efficient spatial-temporal representation f...
research
02/22/2018

Video Person Re-identification by Temporal Residual Learning

In this paper, we propose a novel feature learning framework for video p...
research
07/16/2020

Appearance-Preserving 3D Convolution for Video-based Person Re-identification

Due to the imperfect person detection results and posture changes, tempo...
research
01/08/2021

Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search

Text-based person search aims at retrieving target person in an image ga...
research
09/23/2021

Hierarchical Memory Matching Network for Video Object Segmentation

We present Hierarchical Memory Matching Network (HMMN) for semi-supervis...

Please sign up or login with your details

Forgot password? Click here to reset