Type-to-Track: Retrieve Any Object via Prompt-based Tracking

05/22/2023
by   Pha Nguyen, et al.
0

One of the recent trends in vision problems is to use natural language captions to describe the objects of interest. This approach can overcome some limitations of traditional methods that rely on bounding boxes or category annotations. This paper introduces a novel paradigm for Multiple Object Tracking called Type-to-Track, which allows users to track objects in videos by typing natural language descriptions. We present a new dataset for that Grounded Multiple Object Tracking task, called GroOT, that contains videos with various types of objects and their corresponding textual captions describing their appearance and action in detail. Additionally, we introduce two new evaluation protocols and formulate evaluation metrics specifically for this task. We develop a new efficient method that models a transformer-based eMbed-ENcoDE-extRact framework (MENDER) using the third-order tensor decomposition. The experiments in five scenarios show that our MENDER approach outperforms another two-stage design in terms of accuracy and efficiency, up to 14.7

READ FULL TEXT

page 1

page 4

page 17

page 18

research
03/21/2023

OmniTracker: Unifying Object Tracking by Tracking-with-Detection

Object tracking (OT) aims to estimate the positions of target objects in...
research
09/07/2023

DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners

State-of-the-art visual grounding models can achieve high detection accu...
research
01/09/2023

EgoTracks: A Long-term Egocentric Visual Object Tracking Dataset

Visual object tracking is a key component to many egocentric vision prob...
research
06/24/2020

IA-MOT: Instance-Aware Multi-Object Tracking with Motion Consistency

Multiple object tracking (MOT) is a crucial task in computer vision soci...
research
12/21/2017

Track, then Decide: Category-Agnostic Vision-based Multi-Object Tracking

The most common paradigm for vision-based multi-object tracking is track...
research
11/25/2018

Describe and Attend to Track: Learning Natural Language guided Structural Representation and Visual Attention for Object Tracking

The tracking-by-detection framework requires a set of positive and negat...
research
05/19/2022

Training Vision-Language Transformers from Captions Alone

We show that Vision-Language Transformers can be learned without human l...

Please sign up or login with your details

Forgot password? Click here to reset