Rethinking the competition between detection and ReID in Multi-Object Tracking

by   Chao Liang, et al.

Due to balanced accuracy and speed, joint learning detection and ReID-based one-shot models have drawn great attention in multi-object tracking(MOT). However, the differences between the above two tasks in the one-shot tracking paradigm are unconsciously overlooked, leading to inferior performance than the two-stage methods. In this paper, we dissect the reasoning process of the aforementioned two tasks. Our analysis reveals that the competition of them inevitably hurts the learning of task-dependent representations, which further impedes the tracking performance. To remedy this issue, we propose a novel cross-correlation network that can effectively impel the separate branches to learn task-dependent representations. Furthermore, we introduce a scale-aware attention network that learns discriminative embeddings to improve the ReID capability. We integrate the delicately designed networks into a one-shot online MOT system, dubbed CSTrack. Without bells and whistles, our model achieves new state-of-the-art performances on MOT16 and MOT17. We will release our code to facilitate further work.


Fast Video Object Segmentation With Temporal Aggregation Network and Dynamic Template Matching

Significant progress has been made in Video Object Segmentation (VOS), t...

SMOT: Single-Shot Multi Object Tracking

We present single-shot multi-object tracker (SMOT), a new tracking frame...

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

Multiple object tracking and segmentation requires detecting, tracking, ...

Multi-object Tracking with a Hierarchical Single-branch Network

Recent Multiple Object Tracking (MOT) methods have gradually attempted t...

Real-Time Visual Object Tracking via Few-Shot Learning

Visual Object Tracking (VOT) can be seen as an extended task of Few-Shot...

RetinaTrack: Online Single Stage Joint Detection and Tracking

Traditionally multi-object tracking and object detection are performed u...

MOTS: Multiple Object Tracking for General Categories Based On Few-Shot Method

Most modern Multi-Object Tracking (MOT) systems typically apply REID-bas...