Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking

03/10/2022
by   Boyu Chen, et al.
16

Exploiting a general-purpose neural architecture to replace hand-wired designs or inductive biases has recently drawn extensive interest. However, existing tracking approaches rely on customized sub-modules and need prior knowledge for architecture selection, hindering the tracking development in a more general system. This paper presents a Simplified Tracking architecture (SimTrack) by leveraging a transformer backbone for joint feature extraction and interaction. Unlike existing Siamese trackers, we serialize the input images and concatenate them directly before the one-branch backbone. Feature interaction in the backbone helps to remove well-designed interaction modules and produce a more efficient and effective framework. To reduce the information loss from down-sampling in vision transformers, we further propose a foveal window strategy, providing more diverse input patches with acceptable computational costs. Our SimTrack improves the baseline with 2.5 gains on LaSOT/TNL2K and gets results competitive with other specialized tracking algorithms without bells and whistles.

READ FULL TEXT

page 5

page 10

research
07/07/2023

All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment

Current mainstream vision-language (VL) tracking framework consists of t...
research
09/07/2023

Separable Self and Mixed Attention Transformers for Efficient Object Tracking

The deployment of transformers for visual object tracking has shown stat...
research
08/19/2023

Scalable Video Object Segmentation with Simplified Framework

The current popular methods for video object segmentation (VOS) implemen...
research
06/07/2020

Siamese Keypoint Prediction Network for Visual Object Tracking

Visual object tracking aims to estimate the location of an arbitrary tar...
research
11/09/2022

Efficient Joint Detection and Multiple Object Tracking with Spatially Aware Transformer

We propose a light-weight and highly efficient Joint Detection and Track...
research
02/08/2021

TransReID: Transformer-based Object Re-Identification

In this paper, we explore the Vision Transformer (ViT), a pure transform...
research
11/20/2022

Revisiting Color-Event based Tracking: A Unified Network, Dataset, and Metric

Combining the Color and Event cameras (also called Dynamic Vision Sensor...

Please sign up or login with your details

Forgot password? Click here to reset