Transformer Tracking with Cyclic Shifting Window Attention

05/08/2022
by   Zikai Song, et al.
0

Transformer architecture has been showing its great strength in visual object tracking, for its effective attention mechanism. Existing transformer-based approaches adopt the pixel-to-pixel attention strategy on flattened image features and unavoidably ignore the integrity of objects. In this paper, we propose a new transformer architecture with multi-scale cyclic shifting window attention for visual object tracking, elevating the attention from pixel to window level. The cross-window multi-scale attention has the advantage of aggregating attention at different scales and generates the best fine-scale match for the target object. Furthermore, the cyclic shifting strategy brings greater accuracy by expanding the window samples with positional information, and at the same time saves huge amounts of computational power by removing redundant calculations. Extensive experiments demonstrate the superior performance of our method, which also sets the new state-of-the-art records on five challenging datasets, along with the VOT2020, UAV123, LaSOT, TrackingNet, and GOT-10k benchmarks.

READ FULL TEXT

page 3

page 4

page 5

page 8

research
01/05/2022

Lawin Transformer: Improving Semantic Segmentation Transformer with Multi-Scale Representations via Large Window Attention

Multi-scale representations are crucial for semantic segmentation. The c...
research
03/24/2022

Beyond Fixation: Dynamic Window Visual Transformer

Recently, a surge of interest in visual transformers is to reduce the co...
research
07/12/2022

Tracking Objects as Pixel-wise Distributions

Multi-object tracking (MOT) requires detecting and associating objects t...
research
01/26/2023

Compact Transformer Tracker with Correlative Masked Modeling

Transformer framework has been showing superior performances in visual o...
research
05/15/2022

Video Frame Interpolation with Transformer

Video frame interpolation (VFI), which aims to synthesize intermediate f...
research
09/15/2023

Leveraging the Power of Data Augmentation for Transformer-based Tracking

Due to long-distance correlation and powerful pretrained models, transfo...
research
06/20/2021

More than Encoder: Introducing Transformer Decoder to Upsample

General segmentation models downsample images and then upsample to resto...

Please sign up or login with your details

Forgot password? Click here to reset