Natural Language Video Localization with Learnable Moment Proposals

09/22/2021
by   Shaoning Xiao, et al.
0

Given an untrimmed video and a natural language query, Natural Language Video Localization (NLVL) aims to identify the video moment described by the query. To address this task, existing methods can be roughly grouped into two groups: 1) propose-and-rank models first define a set of hand-designed moment candidates and then find out the best-matching one. 2) proposal-free models directly predict two temporal boundaries of the referential moment from frames. Currently, almost all the propose-and-rank methods have inferior performance than proposal-free counterparts. In this paper, we argue that propose-and-rank approach is underestimated due to the predefined manners: 1) Hand-designed rules are hard to guarantee the complete coverage of targeted segments. 2) Densely sampled candidate moments cause redundant computation and degrade the performance of ranking process. To this end, we propose a novel model termed LPNet (Learnable Proposal Network for NLVL) with a fixed set of learnable moment proposals. The position and length of these proposals are dynamically adjusted during training process. Moreover, a boundary-aware loss has been proposed to leverage frame-level information and further improve the performance. Extensive ablations on two challenging NLVL benchmarks have demonstrated the effectiveness of LPNet over existing state-of-the-art methods.

READ FULL TEXT
research
08/20/2019

Proposal-free Temporal Moment Localization of a Natural-Language Query in Video using Guided Attention

This paper studies the problem of temporal moment localization in a long...
research
03/15/2021

Boundary Proposal Network for Two-Stage Natural Language Video Localization

We aim to address the problem of Natural Language Video Localization (NL...
research
06/06/2023

Prompting Large Language Models to Reformulate Queries for Moment Localization

The task of moment localization is to localize a temporal moment in an u...
research
11/19/2019

Weakly-Supervised Video Moment Retrieval via Semantic Completion Network

Video moment retrieval is to search the moment that is most relevant to ...
research
08/19/2020

Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment Retrieval in Videos

Video moment retrieval aims to localize the target moment in an video ac...
research
04/01/2021

A Survey on Natural Language Video Localization

Natural language video localization (NLVL), which aims to locate a targe...
research
09/14/2021

Adaptive Proposal Generation Network for Temporal Sentence Localization in Videos

We address the problem of temporal sentence localization in videos (TSLV...

Please sign up or login with your details

Forgot password? Click here to reset