Video Moment Retrieval with Text Query Considering Many-to-Many Correspondence Using Potentially Relevant Pair

06/25/2021
by   Sho Maeoki, et al.
0

In this paper we undertake the task of text-based video moment retrieval from a corpus of videos. To train the model, text-moment paired datasets were used to learn the correct correspondences. In typical training methods, ground-truth text-moment pairs are used as positive pairs, whereas other pairs are regarded as negative pairs. However, aside from the ground-truth pairs, some text-moment pairs should be regarded as positive. In this case, one text annotation can be positive for many video moments. Conversely, one video moment can be corresponded to many text annotations. Thus, there are many-to-many correspondences between the text annotations and video moments. Based on these correspondences, we can form potentially relevant pairs, which are not given as ground truth yet are not negative; effectively incorporating such relevant pairs into training can improve the retrieval performance. The text query should describe what is happening in a video moment. Hence, different video moments annotated with similar texts, which contain a similar action, are likely to hold the similar action, thus these pairs can be considered as potentially relevant pairs. In this paper, we propose a novel training method that takes advantage of potentially relevant pairs, which are detected based on linguistic analysis about text annotation. Experiments on two benchmark datasets revealed that our method improves the retrieval performance both quantitatively and qualitatively.

READ FULL TEXT

page 1

page 8

page 11

page 12

page 13

page 14

page 15

research
08/20/2020

Text-based Localization of Moments in a Video Corpus

Prior works on text-based video moment localization focus on temporally ...
research
04/05/2019

Weakly Supervised Video Moment Retrieval From Text Queries

There have been a few recent methods proposed in text to video moment re...
research
06/28/2023

ICSVR: Investigating Compositional and Semantic Understanding in Video Retrieval Models

Video retrieval (VR) involves retrieving the ground truth video from the...
research
03/24/2023

Query-Dependent Video Representation for Moment Retrieval and Highlight Detection

Recently, video moment retrieval and highlight detection (MR/HD) are bei...
research
10/23/2022

Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval

Video corpus moment retrieval (VCMR) is the task to retrieve the most re...
research
03/29/2023

Hierarchical Video-Moment Retrieval and Step-Captioning

There is growing interest in searching for information from large video ...
research
08/19/2020

Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment Retrieval in Videos

Video moment retrieval aims to localize the target moment in an video ac...

Please sign up or login with your details

Forgot password? Click here to reset