Learning Sample Importance for Cross-Scenario Video Temporal Grounding

01/08/2022
by   Peijun Bao, et al.
0

The task of temporal grounding aims to locate video moment in an untrimmed video, with a given sentence query. This paper for the first time investigates some superficial biases that are specific to the temporal grounding task, and proposes a novel targeted solution. Most alarmingly, we observe that existing temporal ground models heavily rely on some biases (e.g., high preference on frequent concepts or certain temporal intervals) in the visual modal. This leads to inferior performance when generalizing the model in cross-scenario test setting. To this end, we propose a novel method called Debiased Temporal Language Localizer (DebiasTLL) to prevent the model from naively memorizing the biases and enforce it to ground the query sentence based on true inter-modal relationship. Debias-TLL simultaneously trains two models. By our design, a large discrepancy of these two models' predictions when judging a sample reveals higher probability of being a biased sample. Harnessing the informative discrepancy, we devise a data re-weighing scheme for mitigating the data biases. We evaluate the proposed model in cross-scenario temporal grounding, where the train / test data are heterogeneously sourced. Experiments show large-margin superiority of the proposed method in comparison with state-of-the-art competitors.

READ FULL TEXT
research
07/29/2022

Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding

Temporal grounding aims to locate a target video moment that semanticall...
research
10/12/2021

Relation-aware Video Reading Comprehension for Temporal Language Grounding

Temporal language grounding in videos aims to localize the temporal span...
research
09/13/2021

On Pursuit of Designing Multi-modal Transformer for Video Grounding

Video grounding aims to localize the temporal segment corresponding to a...
research
01/22/2021

A Closer Look at Temporal Sentence Grounding in Videos: Datasets and Metrics

Despite Temporal Sentence Grounding in Videos (TSGV) has realized impres...
research
01/15/2023

Generating Templated Caption for Video Grounding

Video grounding aims to locate a moment of interest matching the given q...
research
01/03/2022

Memory-Guided Semantic Learning Network for Temporal Sentence Grounding

Temporal sentence grounding (TSG) is crucial and fundamental for video u...
research
06/27/2021

Query-graph with Cross-gating Attention Model for Text-to-Audio Grounding

In this paper, we address the text-to-audio grounding issue, namely, gro...

Please sign up or login with your details

Forgot password? Click here to reset