Spatiotemporal Transformer for Video-based Person Re-identification

03/30/2021
by   Tianyu Zhang, et al.
0

Recently, the Transformer module has been transplanted from natural language processing to computer vision. This paper applies the Transformer to video-based person re-identification, where the key issue is to extract the discriminative information from a tracklet. We show that, despite the strong learning ability, the vanilla Transformer suffers from an increased risk of over-fitting, arguably due to a large number of attention parameters and insufficient training data. To solve this problem, we propose a novel pipeline where the model is pre-trained on a set of synthesized video data and then transferred to the downstream domains with the perception-constrained Spatiotemporal Transformer (STT) module and Global Transformer (GT) module. The derived algorithm achieves significant accuracy gain on three popular video-based person re-identification benchmarks, MARS, DukeMTMC-VideoReID, and LS-VID, especially when the training and testing data are from different domains. More importantly, our research sheds light on the application of the Transformer on highly-structured visual data.

READ FULL TEXT

page 1

page 3

page 7

page 8

research
03/17/2019

STNReID : Deep Convolutional Networks with Pairwise Spatial Transformer Networks for Partial Person Re-identification

Partial person re-identification (ReID) is a challenging task because on...
research
01/04/2022

Short Range Correlation Transformer for Occluded Person Re-Identification

Occluded person re-identification is one of the challenging areas of com...
research
12/08/2020

UnrealPerson: An Adaptive Pipeline towards Costless Person Re-identification

The main difficulty of person re-identification (ReID) lies in collectin...
research
01/10/2022

Multi-Level Attention for Unsupervised Person Re-Identification

The attention mechanism is widely used in deep learning because of its e...
research
09/23/2021

OH-Former: Omni-Relational High-Order Transformer for Person Re-Identification

Transformers have shown preferable performance on many vision tasks. How...
research
12/05/2022

Generalizable Person Re-Identification via Viewpoint Alignment and Fusion

In the current person Re-identification (ReID) methods, most domain gene...
research
04/01/2023

SVT: Supertoken Video Transformer for Efficient Video Understanding

Whether by processing videos with fixed resolution from start to end or ...

Please sign up or login with your details

Forgot password? Click here to reset