Leveraging the Power of Data Augmentation for Transformer-based Tracking

09/15/2023
by   Jie Zhao, et al.
0

Due to long-distance correlation and powerful pretrained models, transformer-based methods have initiated a breakthrough in visual object tracking performance. Previous works focus on designing effective architectures suited for tracking, but ignore that data augmentation is equally crucial for training a well-performing model. In this paper, we first explore the impact of general data augmentations on transformer-based trackers via systematic experiments, and reveal the limited effectiveness of these common strategies. Motivated by experimental observations, we then propose two data augmentation methods customized for tracking. First, we optimize existing random cropping via a dynamic search radius mechanism and simulation for boundary samples. Second, we propose a token-level feature mixing augmentation strategy, which enables the model against challenges like background interference. Extensive experiments on two transformer-based trackers and six benchmarks demonstrate the effectiveness and data efficiency of our methods, especially under challenging settings, like one-shot tracking and small image resolutions.

READ FULL TEXT

page 2

page 4

page 6

research
10/14/2022

TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers

Mixup is a commonly adopted data augmentation technique for image classi...
research
08/11/2022

Towards Sequence-Level Training for Visual Tracking

Despite the extensive adoption of machine learning on the task of visual...
research
03/22/2021

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

In video object tracking, there exist rich temporal contexts among succe...
research
03/31/2022

SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy

Deep learning based singing voice synthesis (SVS) systems have been demo...
research
05/08/2022

Transformer Tracking with Cyclic Shifting Window Attention

Transformer architecture has been showing its great strength in visual o...
research
09/06/2023

Efficient Training for Visual Tracking with Deformable Transformer

Recent Transformer-based visual tracking models have showcased superior ...
research
11/02/2022

SpeechBlender: Speech Augmentation Framework for Mispronunciation Data Generation

One of the biggest challenges in designing mispronunciation detection mo...

Please sign up or login with your details

Forgot password? Click here to reset