Video Moment Retrieval via Natural Language Queries

09/04/2020
by   Xinli Yu, et al.
15

In this paper, we propose a novel method for video moment retrieval (VMR) that achieves state of the arts (SOTA) performance on R@1 metrics and surpassing the SOTA on the high IoU metric (R@1, IoU=0.7). First, we propose to use a multi-head self-attention mechanism, and further a cross-attention scheme to capture video/query interaction and long-range query dependencies from video context. The attention-based methods can develop frame-to-query interaction and query-to-frame interaction at arbitrary positions and the multi-head setting ensures the sufficient understanding of complicated dependencies. Our model has a simple architecture, which enables faster training and inference while maintaining . Second, We also propose to use multiple task training objective consists of moment segmentation task, start/end distribution prediction and start/end location regression task. We have verified that start/end prediction are noisy due to annotator disagreement and joint training with moment segmentation task can provide richer information since frames inside the target clip are also utilized as positive training examples. Third, we propose to use an early fusion approach, which achieves better performance at the cost of inference time. However, the inference time will not be a problem for our model since our model has a simple architecture which enables efficient training and inference.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2019

Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos

Query-based moment retrieval aims to localize the most relevant moment i...
research
08/04/2020

Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization

Query-based moment localization is a new task that localizes the best ma...
research
04/20/2022

Video Moment Retrieval from Text Queries via Single Frame Annotation

Video moment retrieval aims at finding the start and end timestamps of a...
research
01/29/2023

Multi-video Moment Ranking with Multimodal Clue

Video corpus moment retrieval (VCMR) is the task of retrieving a relevan...
research
08/04/2020

Temporal Context Aggregation for Video Retrieval with Contrastive Learning

The current research focus on Content-Based Video Retrieval requires hig...
research
05/23/2023

Faster Video Moment Retrieval with Point-Level Supervision

Video Moment Retrieval (VMR) aims at retrieving the most relevant events...
research
07/20/2020

Graph Neural Network for Video-Query based Video Moment Retrieval

In this paper, we focus on Video Query based Video Moment Retrieval (VQ-...

Please sign up or login with your details

Forgot password? Click here to reset