MixFormerV2: Efficient Fully Transformer Tracking

05/25/2023
by   Yutao Cui, et al.
0

Transformer-based trackers have achieved strong accuracy on the standard benchmarks. However, their efficiency remains an obstacle to practical deployment on both GPU and CPU platforms. In this paper, to overcome this issue, we propose a fully transformer tracking framework, coined as MixFormerV2, without any dense convolutional operation and complex score prediction module. Our key design is to introduce four special prediction tokens and concatenate them with the tokens from target template and search areas. Then, we apply the unified transformer backbone on these mixed token sequence. These prediction tokens are able to capture the complex correlation between target template and search area via mixed attentions. Based on them, we can easily predict the tracking box and estimate its confidence score through simple MLP heads. To further improve the efficiency of MixFormerV2, we present a new distillation-based model reduction paradigm, including dense-to-sparse distillation and deep-to-shallow distillation. The former one aims to transfer knowledge from the dense-head based MixViT to our fully transformer tracker, while the latter one is used to prune some layers of the backbone. We instantiate two types of MixForemrV2, where the MixFormerV2-B achieves an AUC of 70.6% on LaSOT and an AUC of 57.4% on TNL2k with a high GPU speed of 165 FPS, and the MixFormerV2-S surpasses FEAR-L by 2.7% AUC on LaSOT with a real-time CPU speed.

READ FULL TEXT
research
09/07/2023

Separable Self and Mixed Attention Transformers for Efficient Object Tracking

The deployment of transformers for visual object tracking has shown stat...
research
03/29/2021

Transformer Tracking

Correlation acts as a critical role in the tracking field, especially in...
research
03/25/2022

High-Performance Transformer Tracking

Correlation has a critical role in the tracking field, especially in rec...
research
01/26/2023

Compact Transformer Tracker with Correlative Masked Modeling

Transformer framework has been showing superior performances in visual o...
research
10/17/2021

Siamese Transformer Pyramid Networks for Real-Time UAV Tracking

Recent object tracking methods depend upon deep networks or convoluted a...
research
03/21/2022

Transforming Model Prediction for Tracking

Optimization based tracking methods have been widely successful by integ...
research
09/17/2023

LiteTrack: Layer Pruning with Asynchronous Feature Extraction for Lightweight and Efficient Visual Tracking

The recent advancements in transformer-based visual trackers have led to...

Please sign up or login with your details

Forgot password? Click here to reset