A Closer Look at Temporal Sentence Grounding in Videos: Datasets and Metrics

01/22/2021
by   Yitian Yuan, et al.
16

Despite Temporal Sentence Grounding in Videos (TSGV) has realized impressive progress over the last few years, current TSGV models tend to capture the moment annotation biases and fail to take full advantage of multi-modal inputs. Miraculously, some extremely simple TSGV baselines even without training can also achieve state-of-the-art performance. In this paper, we first take a closer look at the existing evaluation protocol, and argue that both the prevailing datasets and metrics are the devils to cause the unreliable benchmarking. To this end, we propose to re-organize two widely-used TSGV datasets (Charades-STA and ActivityNet Captions), and deliberately Change the moment annotation Distribution of the test split to make it different from the training split, dubbed as Charades-CD and ActivityNet-CD, respectively. Meanwhile, we further introduce a new evaluation metric "dR@n,IoU@m" to calibrate the basic IoU scores by penalizing more on the over-long moment predictions and reduce the inflating performance caused by the moment annotation biases. Under this new evaluation protocol, we conduct extensive experiments and ablation studies on eight state-of-the-art TSGV models. All the results demonstrate that the re-organized datasets and new metric can better monitor the progress in TSGV, which is still far from satisfactory. The repository of this work is at <https://github.com/yytzsy/grounding_changing_distribution>.

READ FULL TEXT

page 1

page 5

page 7

research
03/10/2022

A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach

Temporal Sentence Grounding in Videos (TSGV), which aims to ground a nat...
research
08/08/2023

D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation

Temporal sentence grounding (TSG) aims to locate a specific moment from ...
research
01/08/2022

Learning Sample Importance for Cross-Scenario Video Temporal Grounding

The task of temporal grounding aims to locate video moment in an untrimm...
research
09/01/2020

Uncovering Hidden Challenges in Query-Based Video Moment Retrieval

The query-based moment retrieval is a problem of localising a specific c...
research
06/03/2021

Deconfounded Video Moment Retrieval with Causal Intervention

We tackle the task of video moment retrieval (VMR), which aims to locali...
research
09/30/2022

A Closer Look at Temporal Ordering in the Segmentation of Instructional Videos

Understanding the steps required to perform a task is an important skill...
research
09/17/2020

Small but Mighty: New Benchmarks for Split and Rephrase

Split and Rephrase is a text simplification task of rewriting a complex ...

Please sign up or login with your details

Forgot password? Click here to reset