Progressive Localization Networks for Language-based Moment Localization

02/02/2021
by   Qi Zheng, et al.
0

This paper targets the task of language-based moment localization. The language-based setting of this task allows for an open set of target activities, resulting in a large variation of the temporal lengths of video moments. Most existing methods prefer to first sample sufficient candidate moments with various temporal lengths, and then match them with the given query to determine the target moment. However, candidate moments generated with a fixed temporal granularity may be suboptimal to handle the large variation in moment lengths. To this end, we propose a novel multi-stage Progressive Localization Network (PLN) which progressively localizes the target moment in a coarse-to-fine manner. Specifically, each stage of PLN has a localization branch, and focuses on candidate moments that are generated with a specific temporal granularity. The temporal granularities of candidate moments are different across the stages. Moreover, we devise a conditional feature manipulation module and an upsampling connection to bridge the multiple localization branches. In this fashion, the later stages are able to absorb the previously learned information, thus facilitating the more fine-grained localization. Extensive experiments on three public datasets demonstrate the effectiveness of our proposed PLN for language-based moment localization and its potential for localizing short moments in long videos.

READ FULL TEXT

page 1

page 4

page 8

research
12/08/2019

Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language

We address the problem of retrieving a specific moment from an untrimmed...
research
07/30/2019

Temporal Localization of Moments in Video Collections with Natural Language

In this paper, we introduce the task of retrieving relevant video moment...
research
12/04/2020

Multi-Scale 2D Temporal Adjacent Networks for Moment Localization with Natural Language

We address the problem of retrieving a specific moment from an untrimmed...
research
01/29/2023

Multi-video Moment Ranking with Multimodal Clue

Video corpus moment retrieval (VCMR) is the task of retrieving a relevan...
research
10/12/2021

Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos

This paper focuses on tackling the problem of temporal language localiza...
research
11/30/2018

MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment

This research strives for natural language moment retrieval in long, unt...
research
03/12/2023

Towards Diverse Temporal Grounding under Single Positive Labels

Temporal grounding aims to retrieve moments of the described event withi...

Please sign up or login with your details

Forgot password? Click here to reset