Transformer-Based Deep Image Matching for Generalizable Person Re-identification

05/30/2021
by   Shengcai Liao, et al.
0

Transformers have recently gained increasing attention in computer vision. However, existing studies mostly use Transformers for feature representation learning, e.g. for image classification and dense predictions. In this work, we further investigate the possibility of applying Transformers for image matching and metric learning given pairs of images. We find that the Vision Transformer (ViT) and the vanilla Transformer with decoders are not adequate for image matching due to their lack of image-to-image attention. Thus, we further design two naive solutions, i.e. query-gallery concatenation in ViT, and query-gallery cross-attention in the vanilla Transformer. The latter improves the performance, but it is still limited. This implies that the attention mechanism in Transformers is primarily designed for global feature aggregation, which is not naturally suitable for image matching. Accordingly, we propose a new simplified decoder, which drops the full attention implementation with the softmax weighting, keeping only the query-key similarity computation. Additionally, global max pooling and a multilayer perceptron (MLP) head are applied to decode the matching result. This way, the simplified decoder is computationally more efficient, while at the same time more effective for image matching. The proposed method, called TransMatcher, achieves state-of-the-art performance in generalizable person re-identification, with up to 6.1 performance gains in Rank-1 and mAP, respectively, on several popular datasets. The source code of this study will be made publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/18/2022

Multi-manifold Attention for Vision Transformers

Vision Transformer are very popular nowadays due to their state-of-the-a...
research
07/25/2022

Deep Laparoscopic Stereo Matching with Transformers

The self-attention mechanism, successfully employed with the transformer...
research
10/15/2022

Linear Video Transformer with Feature Fixation

Vision Transformers have achieved impressive performance in video classi...
research
09/23/2021

OH-Former: Omni-Relational High-Order Transformer for Person Re-Identification

Transformers have shown preferable performance on many vision tasks. How...
research
04/16/2022

Efficient Linear Attention for Fast and Accurate Keypoint Matching

Recently Transformers have provided state-of-the-art performance in spar...
research
04/23/2019

Interpretable and Generalizable Deep Image Matching with Adaptive Convolutions

For image matching tasks, like face recognition and person re-identifica...
research
06/07/2021

Person Re-Identification with a Locally Aware Transformer

Person Re-Identification is an important problem in computer vision-base...

Please sign up or login with your details

Forgot password? Click here to reset