TrackFormer: Multi-Object Tracking with Transformers

01/07/2021
by   Tim Meinhardt, et al.
16

We present TrackFormer, an end-to-end multi-object tracking and segmentation model based on an encoder-decoder Transformer architecture. Our approach introduces track query embeddings which follow objects through a video sequence in an autoregressive fashion. New track queries are spawned by the DETR object detector and embed the position of their corresponding object over time. The Transformer decoder adjusts track query embeddings from frame to frame, thereby following the changing object positions. TrackFormer achieves a seamless data association between frames in a new tracking-by-attention paradigm by self- and encoder-decoder attention mechanisms which simultaneously reason about location, occlusion, and object identity. TrackFormer yields state-of-the-art performance on the tasks of multi-object tracking (MOT17) and segmentation (MOTS20). We hope our unified way of performing detection and tracking will foster future research in multi-object tracking and video understanding. Code will be made publicly available.

READ FULL TEXT

page 1

page 4

page 7

research
05/07/2021

MOTR: End-to-End Multiple-Object Tracking with TRansformer

The key challenge in multiple-object tracking (MOT) task is temporal mod...
research
03/27/2021

Looking Beyond Two Frames: End-to-End Multi-Object Tracking Using Spatial and Temporal Transformers

Tracking a time-varying indefinite number of objects in a video sequence...
research
03/22/2021

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

In video object tracking, there exist rich temporal contexts among succe...
research
10/26/2022

End-to-end Tracking with a Multi-query Transformer

Multiple-object tracking (MOT) is a challenging task that requires simul...
research
10/17/2022

Track Targets by Dense Spatio-Temporal Position Encoding

In this work, we propose a novel paradigm to encode the position of targ...
research
06/30/2023

S.T.A.R.-Track: Latent Motion Models for End-to-End 3D Object Tracking with Adaptive Spatio-Temporal Appearance Representations

Following the tracking-by-attention paradigm, this paper introduces an o...
research
03/24/2022

Global Tracking Transformers

We present a novel transformer-based architecture for global multi-objec...

Please sign up or login with your details

Forgot password? Click here to reset