Where-and-When to Look: Deep Siamese Attention Networks for Video-based Person Re-identification

08/03/2018
by   Lin Wu, et al.
6

Video-based person re-identification (re-id) is a central application in surveillance systems with significant concern in security. Matching persons across disjoint camera views in their video fragments is inherently challenging due to the large visual variations and uncontrolled frame rates. There are two steps crucial to person re-id, namely discriminative feature learning and metric learning. However, existing approaches consider the two steps independently, and they do not make full use of the temporal and spatial information in videos. In this paper, we propose a Siamese attention architecture that jointly learns spatiotemporal video representations and their similarity metrics. The network extracts local convolutional features from regions of each frame, and enhance their discriminative capability by focusing on distinct regions when measuring the similarity with another pedestrian video. The attention mechanism is embedded into spatial gated recurrent units to selectively propagate relevant features and memorize their spatial dependencies through the network. The model essentially learns which parts (where) from which frames (when) are relevant and distinctive for matching persons and attaches higher importance therein. The proposed Siamese model is end-to-end trainable to jointly learn comparable hidden representations for paired pedestrian videos and their similarity value. Extensive experiments on three benchmark datasets show the effectiveness of each component of the proposed deep network while outperforming state-of-the-art methods.

READ FULL TEXT

page 1

page 8

page 9

page 10

research
09/09/2020

Temporal Attribute-Appearance Learning Network for Video-based Person Re-Identification

Video-based person re-identification aims to match a specific pedestrian...
research
06/06/2016

Deep Recurrent Convolutional Networks for Video-based Person Re-identification: An End-to-End Approach

In this paper, we present an end-to-end approach to simultaneously learn...
research
04/30/2018

Deep Co-attention based Comparators For Relative Representation Learning in Person Re-identification

Person re-identification (re-ID) requires rapid, flexible yet discrimina...
research
07/21/2017

What-and-Where to Match: Deep Spatially Multiplicative Integration Networks for Person Re-identification

Matching pedestrians across disjoint camera views, known as person re-id...
research
03/27/2018

Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

Typical person re-identification (ReID) methods usually describe each pe...
research
11/19/2018

Re-Identification with Consistent Attentive Siamese Networks

We propose a new deep architecture for person re-identification (re-id)....
research
10/07/2020

Channel Recurrent Attention Networks for Video Pedestrian Retrieval

Full attention, which generates an attention value per element of the in...

Please sign up or login with your details

Forgot password? Click here to reset