Learning Tracking Representations via Dual-Branch Fully Transformer Networks

12/05/2021
by   Fei Xie, et al.
6

We present a Siamese-like Dual-branch network based on solely Transformers for tracking. Given a template and a search image, we divide them into non-overlapping patches and extract a feature vector for each patch based on its matching results with others within an attention window. For each token, we estimate whether it contains the target object and the corresponding size. The advantage of the approach is that the features are learned from matching, and ultimately, for matching. So the features are aligned with the object tracking task. The method achieves better or comparable results as the best-performing methods which first use CNN to extract features and then use Transformer to fuse them. It outperforms the state-of-the-art methods on the GOT-10k and VOT2020 benchmarks. In addition, the method achieves real-time inference speed (about 40 fps) on one GPU. The code and models will be released.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 9

page 10

research
08/22/2018

Multi-Branch Siamese Networks with Online Selection for Object Tracking

In this paper, we propose a robust object tracking algorithm based on a ...
research
02/24/2018

A Twofold Siamese Network for Real-Time Object Tracking

Observing that Semantic features learned in an image classification task...
research
08/24/2023

Synchronize Feature Extracting and Matching: A Single Branch Framework for 3D Object Tracking

Siamese network has been a de facto benchmark framework for 3D LiDAR obj...
research
03/03/2022

Correlation-Aware Deep Tracking

Robustness and discrimination power are two fundamental requirements in ...
research
09/15/2019

GradNet: Gradient-Guided Network for Visual Object Tracking

The fully-convolutional siamese network based on template matching has s...
research
02/08/2021

TransReID: Transformer-based Object Re-Identification

In this paper, we explore the Vision Transformer (ViT), a pure transform...
research
07/03/2022

Divert More Attention to Vision-Language Tracking

Relying on Transformer for complex visual feature learning, object track...

Please sign up or login with your details

Forgot password? Click here to reset