Efficient Training for Visual Tracking with Deformable Transformer

09/06/2023
by   Qingmao Wei, et al.
0

Recent Transformer-based visual tracking models have showcased superior performance. Nevertheless, prior works have been resource-intensive, requiring prolonged GPU training hours and incurring high GFLOPs during inference due to inefficient training methods and convolution-based target heads. This intensive resource use renders them unsuitable for real-world applications. In this paper, we present DETRack, a streamlined end-to-end visual object tracking framework. Our framework utilizes an efficient encoder-decoder structure where the deformable transformer decoder acting as a target head, achieves higher sparsity than traditional convolution heads, resulting in decreased GFLOPs. For training, we introduce a novel one-to-many label assignment and an auxiliary denoising technique, significantly accelerating model's convergence. Comprehensive experiments affirm the effectiveness and efficiency of our proposed method. For instance, DETRack achieves 72.9 benchmarks using only 20 runs with lower GFLOPs than all the transformer-based trackers.

READ FULL TEXT
research
04/27/2023

SeqTrack: Sequence to Sequence Learning for Visual Object Tracking

In this paper, we present a new sequence-to-sequence learning framework ...
research
03/22/2021

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

In video object tracking, there exist rich temporal contexts among succe...
research
06/08/2021

TED-net: Convolution-free T2T Vision Transformer-based Encoder-decoder Dilation network for Low-dose CT Denoising

Low dose computed tomography is a mainstream for clinical applications. ...
research
03/31/2021

Learning Spatio-Temporal Transformer for Visual Tracking

In this paper, we present a new tracking architecture with an encoder-de...
research
10/26/2022

End-to-end Tracking with a Multi-query Transformer

Multiple-object tracking (MOT) is a challenging task that requires simul...
research
09/09/2023

DeNoising-MOT: Towards Multiple Object Tracking with Severe Occlusions

Multiple object tracking (MOT) tends to become more challenging when sev...
research
09/15/2023

Leveraging the Power of Data Augmentation for Transformer-based Tracking

Due to long-distance correlation and powerful pretrained models, transfo...

Please sign up or login with your details

Forgot password? Click here to reset