Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval

10/23/2022
by   Minjoon Jung, et al.
8

Video corpus moment retrieval (VCMR) is the task to retrieve the most relevant video moment from a large video corpus using a natural language query. For narrative videos, e.g., dramas or movies, the holistic understanding of temporal dynamics and multimodal reasoning is crucial. Previous works have shown promising results; however, they relied on the expensive query annotations for VCMR, i.e., the corresponding moment intervals. To overcome this problem, we propose a self-supervised learning framework: Modal-specific Pseudo Query Generation Network (MPGN). First, MPGN selects candidate temporal moments via subtitle-based moment sampling. Then, it generates pseudo queries exploiting both visual and textual information from the selected temporal moments. Through the multimodal information in the pseudo queries, we show that MPGN successfully learns to localize the video corpus moment without any explicit annotation. We validate the effectiveness of MPGN on the TVR dataset, showing competitive results compared with both supervised models and unsupervised setting models.

READ FULL TEXT

page 1

page 4

page 8

page 11

page 12

page 13

research
08/20/2020

Text-based Localization of Moments in a Video Corpus

Prior works on text-based video moment localization focus on temporally ...
research
06/05/2023

Overcoming Weak Visual-Textual Alignment for Video Moment Retrieval

Video moment retrieval (VMR) aims to identify the specific moment in an ...
research
10/17/2022

Selective Query-guided Debiasing Network for Video Corpus Moment Retrieval

Video moment retrieval (VMR) aims to localize target moments in untrimme...
research
06/03/2021

Deconfounded Video Moment Retrieval with Causal Intervention

We tackle the task of video moment retrieval (VMR), which aims to locali...
research
02/19/2023

Interactive Video Corpus Moment Retrieval using Reinforcement Learning

Known-item video search is effective with human-in-the-loop to interacti...
research
03/29/2023

Hierarchical Video-Moment Retrieval and Step-Captioning

There is growing interest in searching for information from large video ...
research
06/25/2021

Video Moment Retrieval with Text Query Considering Many-to-Many Correspondence Using Potentially Relevant Pair

In this paper we undertake the task of text-based video moment retrieval...

Please sign up or login with your details

Forgot password? Click here to reset