SeqTrack: Sequence to Sequence Learning for Visual Object Tracking

04/27/2023
by   Xin Chen, et al.
7

In this paper, we present a new sequence-to-sequence learning framework for visual tracking, dubbed SeqTrack. It casts visual tracking as a sequence generation problem, which predicts object bounding boxes in an autoregressive fashion. This is different from prior Siamese trackers and transformer trackers, which rely on designing complicated head networks, such as classification and regression heads. SeqTrack only adopts a simple encoder-decoder transformer architecture. The encoder extracts visual features with a bidirectional transformer, while the decoder generates a sequence of bounding box values autoregressively with a causal transformer. The loss function is a plain cross-entropy. Such a sequence learning paradigm not only simplifies tracking framework, but also achieves competitive performance on benchmarks. For instance, SeqTrack gets 72.5 state-of-the-art performance. Code and models are available at here.

READ FULL TEXT

page 3

page 8

research
03/31/2021

Learning Spatio-Temporal Transformer for Visual Tracking

In this paper, we present a new tracking architecture with an encoder-de...
research
09/06/2023

Efficient Training for Visual Tracking with Deformable Transformer

Recent Transformer-based visual tracking models have showcased superior ...
research
03/22/2021

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

In video object tracking, there exist rich temporal contexts among succe...
research
12/17/2020

End-to-end Deep Object Tracking with Circular Loss Function for Rotated Bounding Box

The task object tracking is vital in numerous applications such as auton...
research
05/28/2023

Bayesian Decision Making to Localize Visual Queries in 2D

This report describes our approach for the EGO4D 2023 Visual Query 2D Lo...
research
07/03/2022

Divert More Attention to Vision-Language Tracking

Relying on Transformer for complex visual feature learning, object track...
research
08/27/2023

Towards Unified Token Learning for Vision-Language Tracking

In this paper, we present a simple, flexible and effective vision-langua...

Please sign up or login with your details

Forgot password? Click here to reset