Tracking Objects and Activities with Attention for Temporal Sentence Grounding

02/21/2023
by   Zeyu Xiong, et al.
0

Temporal sentence grounding (TSG) aims to localize the temporal segment which is semantically aligned with a natural language query in an untrimmed video.Most existing methods extract frame-grained features or object-grained features by 3D ConvNet or detection network under a conventional TSG framework, failing to capture the subtle differences between frames or to model the spatio-temporal behavior of core persons/objects. In this paper, we introduce a new perspective to address the TSG task by tracking pivotal objects and activities to learn more fine-grained spatio-temporal behaviors. Specifically, we propose a novel Temporal Sentence Tracking Network (TSTNet), which contains (A) a Cross-modal Targets Generator to generate multi-modal templates and search space, filtering objects and activities, and (B) a Temporal Sentence Tracker to track multi-modal targets for modeling the targets' behavior and to predict query-related segment. Extensive experiments and comparisons with state-of-the-arts are conducted on challenging benchmarks: Charades-STA and TACoS. And our TSTNet achieves the leading performance with a considerable real-time speed.

READ FULL TEXT

page 1

page 2

research
01/19/2020

Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences

In this paper, we consider a novel task, Spatio-Temporal Video Grounding...
research
07/02/2022

Gaussian Kernel-based Cross Modal Network for Spatio-Temporal Video Grounding

Spatial-Temporal Video Grounding (STVG) is a challenging task which aims...
research
09/12/2023

Dual-Path Temporal Map Optimization for Make-up Temporal Video Grounding

Make-up temporal video grounding (MTVG) aims to localize the target vide...
research
08/11/2022

HERO: HiErarchical spatio-tempoRal reasOning with Contrastive Action Correspondence for End-to-End Video Object Grounding

Video Object Grounding (VOG) is the problem of associating spatial objec...
research
09/14/2021

Progressively Guide to Attend: An Iterative Alignment Framework for Temporal Sentence Grounding

A key solution to temporal sentence grounding (TSG) exists in how to lea...
research
06/14/2021

2rd Place Solutions in the HC-STVG track of Person in Context Challenge 2021

In this technical report, we present our solution to localize a spatio-t...
research
02/27/2022

Dual-Branched Spatio-temporal Fusion Network for Multi-horizon Tropical Cyclone Track Forecast

Tropical cyclone (TC) is an extreme tropical weather system and its traj...

Please sign up or login with your details

Forgot password? Click here to reset