SwinTrack: A Simple and Strong Baseline for Transformer Tracking

12/02/2021
by   Liting Lin, et al.
0

Transformer has recently demonstrated clear potential in improving visual tracking algorithms. Nevertheless, existing transformer-based trackers mostly use Transformer to fuse and enhance the features generated by convolutional neural networks (CNNs). By contrast, in this paper, we propose a fully attentional-based Transformer tracking algorithm, Swin-Transformer Tracker (SwinTrack). SwinTrack uses Transformer for both feature extraction and feature fusion, allowing full interactions between the target object and the search region for tracking. To further improve performance, we investigate comprehensively different strategies for feature fusion, position encoding, and training loss. All these efforts make SwinTrack a simple yet solid baseline. In our thorough experiments, SwinTrack sets a new record with 0.717 SUC on LaSOT, surpassing STARK by 4.6% while still running at 45 FPS. Besides, it achieves state-of-the-art performances with 0.483 SUC, 0.832 SUC and 0.694 AO on other challenging LaSOT_ext, TrackingNet, and GOT-10k. Our implementation and trained models are available at https://github.com/LitingLin/SwinTrack.

READ FULL TEXT
research
07/03/2022

Divert More Attention to Vision-Language Tracking

Relying on Transformer for complex visual feature learning, object track...
research
03/29/2021

Transformer Tracking

Correlation acts as a critical role in the tracking field, especially in...
research
09/13/2023

Transparent Object Tracking with Enhanced Fusion Module

Accurate tracking of transparent objects, such as glasses, plays a criti...
research
09/09/2021

Bag of Tricks for Optimizing Transformer Efficiency

Improving Transformer efficiency has become increasingly attractive rece...
research
06/13/2022

Transformer Lesion Tracker

Evaluating lesion progression and treatment response via longitudinal le...
research
05/18/2023

Improving Toponym Resolution with Better Candidate Generation, Transformer-based Reranking, and Two-Stage Resolution

Geocoding is the task of converting location mentions in text into struc...
research
09/17/2023

LiteTrack: Layer Pruning with Asynchronous Feature Extraction for Lightweight and Efficient Visual Tracking

The recent advancements in transformer-based visual trackers have led to...

Please sign up or login with your details

Forgot password? Click here to reset