Temporal Localization of Moments in Video Collections with Natural Language

07/30/2019
by   Victor Escorcia, et al.
1

In this paper, we introduce the task of retrieving relevant video moments from a large corpus of untrimmed, unsegmented videos given a natural language query. Our task poses unique challenges as a system must efficiently identify both the relevant videos and localize the relevant moments in the videos. This task is in contrast to prior work that localizes relevant moments in a single video or searches a large collection of already-segmented videos. For our task, we introduce Clip Alignment with Language (CAL), a model that aligns features for a natural language query to a sequence of short video clips that compose a candidate moment in a video. Our approach goes beyond prior work that aggregates video features over a candidate moment by allowing for finer clip alignment. Moreover, our approach is amenable to efficient indexing of the resulting clip-level representations, which makes it suitable for moment localization in large video collections. We evaluate our approach on three recently proposed datasets for temporal localization of moments in video with natural language extended to our video corpus moment retrieval setting: DiDeMo, Charades-STA, and ActivityNet-captions. We show that our CAL model outperforms the recently proposed Moment Context Network (MCN) on all criteria across all datasets on our proposed task, obtaining an 8 average recall and median rank, respectively, and achieves 5x faster retrieval and 8x smaller index size with a 500K video corpus.

READ FULL TEXT

page 2

page 3

page 7

research
08/20/2020

Text-based Localization of Moments in a Video Corpus

Prior works on text-based video moment localization focus on temporally ...
research
01/29/2023

Multi-video Moment Ranking with Multimodal Clue

Video corpus moment retrieval (VCMR) is the task of retrieving a relevan...
research
07/20/2021

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

Detecting customized moments and highlights from videos given natural la...
research
02/02/2021

Progressive Localization Networks for Language-based Moment Localization

This paper targets the task of language-based moment localization. The l...
research
01/24/2020

TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval

We introduce a new multimodal retrieval task - TV show Retrieval (TVR), ...
research
10/15/2022

Semantic Video Moments Retrieval at Scale: A New Task and a Baseline

Motivated by the increasing need of saving search effort by obtaining re...
research
04/22/2019

Tripping through time: Efficient Localization of Activities in Videos

Localizing moments in untrimmed videos via language queries is a new and...

Please sign up or login with your details

Forgot password? Click here to reset