Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence Grounding

01/02/2023
by   Jiahao Zhu, et al.
0

Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.

READ FULL TEXT

page 4

page 8

research
03/22/2021

Context-aware Biaffine Localizing Network for Temporal Sentence Grounding

This paper addresses the problem of temporal sentence grounding (TSG), w...
research
08/31/2022

Hierarchical Local-Global Transformer for Temporal Sentence Grounding

This paper studies the multimedia problem of temporal sentence grounding...
research
01/05/2023

Hypotheses Tree Building for One-Shot Temporal Sentence Localization

Given an untrimmed video, temporal sentence localization (TSL) aims to l...
research
03/02/2023

Jointly Visual- and Semantic-Aware Graph Memory Networks for Temporal Sentence Localization in Videos

Temporal sentence localization in videos (TSLV) aims to retrieve the mos...
research
03/31/2021

Embracing Uncertainty: Decoupling and De-bias for Robust Temporal Grounding

Temporal grounding aims to localize temporal boundaries within untrimmed...
research
05/06/2023

Transform-Equivariant Consistency Learning for Temporal Sentence Grounding

This paper addresses the temporal sentence grounding (TSG). Although exi...
research
09/14/2021

Progressively Guide to Attend: An Iterative Alignment Framework for Temporal Sentence Grounding

A key solution to temporal sentence grounding (TSG) exists in how to lea...

Please sign up or login with your details

Forgot password? Click here to reset