Embracing Uncertainty: Decoupling and De-bias for Robust Temporal Grounding

03/31/2021
by   Hao Zhou, et al.
0

Temporal grounding aims to localize temporal boundaries within untrimmed videos by language queries, but it faces the challenge of two types of inevitable human uncertainties: query uncertainty and label uncertainty. The two uncertainties stem from human subjectivity, leading to limited generalization ability of temporal grounding. In this work, we propose a novel DeNet (Decoupling and De-bias) to embrace human uncertainty: Decoupling - We explicitly disentangle each query into a relation feature and a modified feature. The relation feature, which is mainly based on skeleton-like words (including nouns and verbs), aims to extract basic and consistent information in the presence of query uncertainty. Meanwhile, modified feature assigned with style-like words (including adjectives, adverbs, etc) represents the subjective information, and thus brings personalized predictions; De-bias - We propose a de-bias mechanism to generate diverse predictions, aim to alleviate the bias caused by single-style annotations in the presence of label uncertainty. Moreover, we put forward new multi-label metrics to diversify the performance evaluation. Extensive experiments show that our approach is more effective and robust than state-of-the-arts on Charades-STA and ActivityNet Captions datasets.

READ FULL TEXT

page 1

page 3

page 7

page 8

research
03/12/2023

Towards Diverse Temporal Grounding under Single Positive Labels

Temporal grounding aims to retrieve moments of the described event withi...
research
01/02/2023

Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence Grounding

Temporal sentence grounding (TSG) aims to identify the temporal boundary...
research
09/23/2020

A Simple Yet Effective Method for Video Temporal Grounding with Cross-Modality Attention

The task of language-guided video temporal grounding is to localize the ...
research
03/24/2022

Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning

Temporal grounding in videos aims to localize one target video segment t...
research
03/28/2023

That Label's Got Style: Handling Label Style Bias for Uncertain Image Segmentation

Segmentation uncertainty models predict a distribution over plausible se...
research
02/20/2023

Constraint and Union for Partially-Supervised Temporal Sentence Grounding

Temporal sentence grounding aims to detect the event timestamps describe...
research
03/10/2022

A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach

Temporal Sentence Grounding in Videos (TSGV), which aims to ground a nat...

Please sign up or login with your details

Forgot password? Click here to reset