Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video

01/25/2020
by   Zhenfang Chen, et al.
0

In this paper, we study the problem of weakly-supervised temporal grounding of sentence in video. Specifically, given an untrimmed video and a query sentence, our goal is to localize a temporal segment in the video that semantically corresponds to the query sentence, with no reliance on any temporal annotation during training. We propose a two-stage model to tackle this problem in a coarse-to-fine manner. In the coarse stage, we first generate a set of fixed-length temporal proposals using multi-scale sliding windows, and match their visual features against the sentence features to identify the best-matched proposal as a coarse grounding result. In the fine stage, we perform a fine-grained matching between the visual features of the frames in the best-matched proposal and the sentence features to locate the precise frame boundary of the fine grounding result. Comprehensive experiments on the ActivityNet Captions dataset and the Charades-STA dataset demonstrate that our two-stage model achieves compelling performance.

READ FULL TEXT
research
07/18/2023

Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding

3D visual grounding involves finding a target object in a 3D scene that ...
research
03/16/2020

Weakly-Supervised Multi-Level Attentional Reconstruction Network for Grounding Textual Queries in Videos

The task of temporally grounding textual queries in videos is to localiz...
research
03/09/2023

Text-Visual Prompting for Efficient 2D Temporal Video Grounding

In this paper, we study the problem of temporal video grounding (TVG), w...
research
05/08/2018

Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction

We study weakly-supervised video object grounding: given a video segment...
research
07/20/2023

No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection

Temporal video grounding (TVG) aims to retrieve the time interval of a l...
research
06/06/2019

Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video

In this paper, we address a novel task, namely weakly-supervised spatio-...
research
09/12/2023

Dual-Path Temporal Map Optimization for Make-up Temporal Video Grounding

Make-up temporal video grounding (MTVG) aims to localize the target vide...

Please sign up or login with your details

Forgot password? Click here to reset