Z-GMOT: Zero-shot Generic Multiple Object Tracking

05/28/2023
by   Kim Hoang Tran, et al.
0

Despite the significant progress made in recent years, Multi-Object Tracking (MOT) approaches still suffer from several limitations, including their reliance on prior knowledge of tracking targets, which necessitates the costly annotation of large labeled datasets. As a result, existing MOT methods are limited to a small set of predefined categories, and they struggle with unseen objects in the real world. To address these issues, Generic Multiple Object Tracking (GMOT) has been proposed, which requires less prior information about the targets. However, all existing GMOT approaches follow a one-shot paradigm, relying mainly on the initial bounding box and thus struggling to handle variants e.g., viewpoint, lighting, occlusion, scale, and etc. In this paper, we introduce a novel approach to address the limitations of existing MOT and GMOT methods. Specifically, we propose a zero-shot GMOT (Z-GMOT) algorithm that can track never-seen object categories with zero training examples, without the need for predefined categories or an initial bounding box. To achieve this, we propose iGLIP, an improved version of Grounded language-image pretraining (GLIP), which can detect unseen objects while minimizing false positives. We evaluate our Z-GMOT thoroughly on the GMOT-40 dataset, AnimalTrack testset, DanceTrack testset. The results of these evaluations demonstrate a significant improvement over existing methods. For instance, on the GMOT-40 dataset, the Z-GMOT outperforms one-shot GMOT with OC-SORT by 27.79 points HOTA and 44.37 points MOTA. On the AnimalTrack dataset, it surpasses fully-supervised methods with DeepSORT by 12.55 points HOTA and 8.97 points MOTA. To facilitate further research, we will make our code and models publicly available upon acceptance of this paper.

READ FULL TEXT

page 1

page 4

page 6

page 7

page 12

page 13

page 14

page 16

research
11/20/2020

Open-Vocabulary Object Detection Using Captions

Despite the remarkable accuracy of deep neural networks in object detect...
research
11/24/2020

GMOT-40: A Benchmark for Generic Multiple Object Tracking

Multiple Object Tracking (MOT) has witnessed remarkable advances in rece...
research
12/22/2022

SupeRGB-D: Zero-shot Instance Segmentation in Cluttered Indoor Environments

Object instance segmentation is a key challenge for indoor robots naviga...
research
03/16/2018

Deep Multiple Instance Learning for Zero-shot Image Tagging

In-line with the success of deep learning on traditional recognition pro...
research
05/19/2020

MOTS: Multiple Object Tracking for General Categories Based On Few-Shot Method

Most modern Multi-Object Tracking (MOT) systems typically apply REID-bas...
research
12/09/2021

Few-Shot Keypoint Detection as Task Adaptation via Latent Embeddings

Dense object tracking, the ability to localize specific object points wi...
research
03/08/2022

Universal Prototype Transport for Zero-Shot Action Recognition and Localization

This work addresses the problem of recognizing action categories in vide...

Please sign up or login with your details

Forgot password? Click here to reset