Grounding-Tracking-Integration

12/13/2019
by   Zhengyuan Yang, et al.
10

In this paper, we study tracking by language that localizes the target box sequence in a video based on a language query. We propose a framework called GTI that decomposes the problem into three sub-tasks: Grounding, Tracking and Integration. The three sub-task modules operate simultaneously and predict the box sequence frame-by-frame. "Grounding" predicts the referred region directly from the language query. "Tracking" localizes the target based on the history of the grounded regions in previous frames. "Integration" generates final predictions by synergistically combining grounding and tracking. With the "integration" task as the key, we explore how to indicate the quality of the grounded regions in each frame and achieve the desired mutually beneficial combination. To this end, we propose an "RT-integration" method that defines and predicts two scores to guide the integration: 1) R-score represents the Region correctness whether the grounding prediction accurately covers the target, and 2) T-score represents the Template quality whether the region provides informative visual cues to improve tracking in future frames. We present our real-time GTI implementation with the proposed RT-integration, and benchmark the framework on LaSOT and Lingual OTB99 with highly promising results. Moreover, a disambiguated version of LaSOT queries can be used to facilitate future tracking by language studies.

READ FULL TEXT

page 4

page 8

page 11

research
03/21/2023

Joint Visual Grounding and Tracking with Natural Language Specification

Tracking by natural language specification aims to locate the referred t...
research
03/15/2022

End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video Grounding

Natural language spatial video grounding aims to detect the relevant obj...
research
04/07/2020

Dense Regression Network for Video Grounding

We address the problem of video grounding from natural language queries....
research
05/23/2023

Cross3DVG: Baseline and Dataset for Cross-Dataset 3D Visual Grounding on Different RGB-D Scans

We present Cross3DVG, a novel task for cross-dataset visual grounding in...
research
09/10/2021

Panoptic Narrative Grounding

This paper proposes Panoptic Narrative Grounding, a spatially fine and g...
research
12/24/2021

Grounding Linguistic Commands to Navigable Regions

Humans have a natural ability to effortlessly comprehend linguistic comm...
research
03/14/2022

Grounding Commands for Autonomous Vehicles via Layer Fusion with Region-specific Dynamic Layer Attention

Grounding a command to the visual environment is an essential ingredient...

Please sign up or login with your details

Forgot password? Click here to reset