SparseTT: Visual Tracking with Sparse Transformers

05/08/2022
by   Zhihong Fu, et al.
0

Transformers have been successfully applied to the visual tracking task and significantly promote tracking performance. The self-attention mechanism designed to model long-range dependencies is the key to the success of Transformers. However, self-attention lacks focusing on the most relevant information in the search regions, making it easy to be distracted by background. In this paper, we relieve this issue with a sparse attention mechanism by focusing the most relevant information in the search regions, which enables a much accurate tracking. Furthermore, we introduce a double-head predictor to boost the accuracy of foreground-background classification and regression of target bounding boxes, which further improve the tracking performance. Extensive experiments show that, without bells and whistles, our method significantly outperforms the state-of-the-art approaches on LaSOT, GOT-10k, TrackingNet, and UAV123, while running at 40 FPS. Notably, the training time of our method is reduced by 75 source code and models are available at https://github.com/fzh0917/SparseTT.

READ FULL TEXT
research
12/27/2022

DAE-Former: Dual Attention-guided Efficient Transformer for Medical Image Segmentation

Transformers have recently gained attention in the computer vision domai...
research
10/07/2021

TranSalNet: Towards perceptually relevant visual saliency prediction

Convolutional neural networks (CNNs) have significantly advanced computa...
research
02/13/2020

SpotNet: Self-Attention Multi-Task Network for Object Detection

Humans are very good at directing their visual attention toward relevant...
research
01/26/2023

Compact Transformer Tracker with Correlative Masked Modeling

Transformer framework has been showing superior performances in visual o...
research
04/23/2022

Visual Attention Emerges from Recurrent Sparse Reconstruction

Visual attention helps achieve robust perception under noise, corruption...
research
03/21/2023

Online Transformers with Spiking Neurons for Fast Prosthetic Hand Control

Transformers are state-of-the-art networks for most sequence processing ...
research
08/21/2023

Spatial Transform Decoupling for Oriented Object Detection

Vision Transformers (ViTs) have achieved remarkable success in computer ...

Please sign up or login with your details

Forgot password? Click here to reset