Video Moment Retrieval from Text Queries via Single Frame Annotation

04/20/2022
by   Ran Cui, et al.
0

Video moment retrieval aims at finding the start and end timestamps of a moment (part of a video) described by a given natural language query. Fully supervised methods need complete temporal boundary annotations to achieve promising results, which is costly since the annotator needs to watch the whole moment. Weakly supervised methods only rely on the paired video and query, but the performance is relatively poor. In this paper, we look closer into the annotation process and propose a new paradigm called "glance annotation". This paradigm requires the timestamp of only one single random frame, which we refer to as a "glance", within the temporal boundary of the fully supervised counterpart. We argue this is beneficial because comparing to weak supervision, trivial cost is added yet more potential in performance is provided. Under the glance annotation setting, we propose a method named as Video moment retrieval via Glance Annotation (ViGA) based on contrastive learning. ViGA cuts the input video into clips and contrasts between clips and queries, in which glance guided Gaussian distributed weights are assigned to all clips. Our extensive experiments indicate that ViGA achieves better results than the state-of-the-art weakly supervised methods by a large margin, even comparable to fully supervised methods in some cases.

READ FULL TEXT

page 2

page 9

research
08/08/2023

D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation

Temporal sentence grounding (TSG) aims to locate a specific moment from ...
research
05/23/2023

Faster Video Moment Retrieval with Point-Level Supervision

Video Moment Retrieval (VMR) aims at retrieving the most relevant events...
research
04/05/2019

Weakly Supervised Video Moment Retrieval From Text Queries

There have been a few recent methods proposed in text to video moment re...
research
11/04/2021

Multi-scale 2D Representation Learning for weakly-supervised moment retrieval

Video moment retrieval aims to search the moment most relevant to a give...
research
09/04/2020

Video Moment Retrieval via Natural Language Queries

In this paper, we propose a novel method for video moment retrieval (VMR...
research
09/27/2019

wMAN: Weakly-supervised Moment Alignment Network for Text-based Video Segment Retrieval

Given a video and a sentence, the goal of weakly-supervised video moment...
research
08/31/2019

WSLLN: Weakly Supervised Natural Language Localization Networks

We propose weakly supervised language localization networks (WSLLN) to d...

Please sign up or login with your details

Forgot password? Click here to reset