Proposal-free Temporal Moment Localization of a Natural-Language Query in Video using Guided Attention

08/20/2019
by   Cristian Rodriguez Opazo, et al.
0

This paper studies the problem of temporal moment localization in a long untrimmed video using natural language as the query. Given an untrimmed video and a sentence as the query, the goal is to determine the starting, and the ending, of the relevant visual moment in the video, that corresponds to the query sentence. While previous works have tackled this task by a propose-and-rank approach, we introduce a more efficient, end-to-end trainable, and proposal-free approach that relies on three key components: a dynamic filter to transfer language information to the visual domain, a new loss function to guide our model to attend the most relevant parts of the video, and soft labels to model annotation uncertainty. We evaluate our method on two benchmark datasets, Charades-STA and ActivityNet-Captions. Experimental results show that our approach outperforms state-of-the-art methods on both datasets.

READ FULL TEXT
research
10/13/2020

DORi: Discovering Object Relationship for Moment Localization of a Natural-Language Query in Video

This paper studies the task of temporal moment localization in a long un...
research
09/22/2021

Natural Language Video Localization with Learnable Moment Proposals

Given an untrimmed video and a natural language query, Natural Language ...
research
09/01/2020

Uncovering Hidden Challenges in Query-Based Video Moment Retrieval

The query-based moment retrieval is a problem of localising a specific c...
research
04/01/2021

A Survey on Natural Language Video Localization

Natural language video localization (NLVL), which aims to locate a targe...
research
08/11/2019

Exploiting Temporal Relationships in Video Moment Localization with Natural Language

We address the problem of video moment localization with natural languag...
research
09/07/2021

Learning to Combine the Modalities of Language and Video for Temporal Moment Localization

Temporal moment localization aims to retrieve the best video segment mat...
research
08/10/2022

Exploring Anchor-based Detection for Ego4D Natural Language Query

In this paper we provide the technique report of Ego4D natural language ...

Please sign up or login with your details

Forgot password? Click here to reset