Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction

09/11/2019
by   Jingwen Wang, et al.
0

The task of temporally grounding language queries in videos is to temporally localize the best matched video segment corresponding to a given language (sentence). It requires certain models to simultaneously perform visual and linguistic understandings. Previous work predominantly ignores the precision of segment localization. Sliding window based methods use predefined search window sizes, which suffer from redundant computation, while existing anchor-based approaches fail to yield precise localization. We address this issue by proposing an end-to-end boundary-aware model, which uses a lightweight branch to predict semantic boundaries corresponding to the given linguistic information. To better detect semantic boundaries, we propose to aggregate contextual information by explicitly modeling the relationship between the current element and its neighbors. The most confident segments are subsequently selected based on both anchor and boundary predictions at the testing stage. The proposed model, dubbed Contextual Boundary-aware Prediction (CBP), outperforms its competitors with a clear margin on three public datasets.

READ FULL TEXT

page 1

page 3

page 7

research
10/31/2019

Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos

Temporal sentence grounding in videos aims to detect and localize one ta...
research
03/15/2021

Boundary Proposal Network for Two-Stage Natural Language Video Localization

We aim to address the problem of Natural Language Video Localization (NL...
research
01/21/2019

Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos

The task of video grounding, which temporally localizes a natural langua...
research
05/12/2022

Entity-aware and Motion-aware Transformers for Language-driven Action Localization in Videos

Language-driven action localization in videos is a challenging task that...
research
07/20/2022

HTNet: Anchor-free Temporal Action Localization with Hierarchical Transformers

Temporal action localization (TAL) is a task of identifying a set of act...
research
09/16/2021

A Survey on Temporal Sentence Grounding in Videos

Temporal sentence grounding in videos(TSGV), which aims to localize one ...
research
01/18/2020

Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video

Temporally language grounding in untrimmed videos is a newly-raised task...

Please sign up or login with your details

Forgot password? Click here to reset