MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment

11/30/2018
by   Da Zhang, et al.
0

This research strives for natural language moment retrieval in long, untrimmed video streams. The problem nevertheless is not trivial especially when a video contains multiple moments of interests and the language describes complex temporal dependencies, which often happens in real scenarios. We identify two crucial challenges: semantic misalignment and structural misalignment. However, existing approaches treat different moments separately and do not explicitly model complex moment-wise temporal relations. In this paper, we present Moment Alignment Network (MAN), a novel framework that unifies the candidate moment encoding and temporal structural reasoning in a single-shot feed-forward network. MAN naturally assigns candidate moment representations aligned with language semantics over different temporal locations and scales. Most importantly, we propose to explicitly model moment-wise temporal relations as a structured graph and devise an iterative graph adjustment network to jointly learn the best structure in an end-to-end manner. We evaluate the proposed approach on two challenging public benchmarks Charades-STA and DiDeMo, where our MAN significantly outperforms the state-of-the-art by a large margin.

READ FULL TEXT

page 1

page 3

page 8

research
12/08/2019

Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language

We address the problem of retrieving a specific moment from an untrimmed...
research
12/04/2020

Multi-Scale 2D Temporal Adjacent Networks for Moment Localization with Natural Language

We address the problem of retrieving a specific moment from an untrimmed...
research
09/05/2018

Localizing Moments in Video with Temporal Language

Localizing moments in a longer video via natural language queries is a n...
research
07/26/2021

Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference

Video-and-Language Inference is a recently proposed task for joint video...
research
02/02/2021

Progressive Localization Networks for Language-based Moment Localization

This paper targets the task of language-based moment localization. The l...
research
11/04/2021

Multi-scale 2D Representation Learning for weakly-supervised moment retrieval

Video moment retrieval aims to search the moment most relevant to a give...
research
07/20/2021

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

Detecting customized moments and highlights from videos given natural la...

Please sign up or login with your details

Forgot password? Click here to reset