To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression

04/19/2018
by   Yitian Yuan, et al.
0

Given an untrimmed video and a sentence description, temporal sentence localization aims to automatically determine the start and end points of the described sentence within the video. The problem is challenging as it needs the understanding of both video and sentence. Existing research predominantly employs a costly "scan and localize" framework, neglecting the global video context and the specific details within sentences which play as critical issues for this problem. In this paper, we propose a novel Attention Based Location Regression (ABLR) approach to solve the temporal sentence localization from a global perspective. Specifically, to preserve the context information, ABLR first encodes both video and sentence via Bidirectional LSTM networks. Then, a multi-modal co-attention mechanism is introduced to generate not only video attention which reflects the global video structure, but also sentence attention which highlights the crucial details for temporal localization. Finally, a novel attention based location regression network is designed to predict the temporal coordinates of sentence query from the previous attention. ABLR is jointly trained in an end-to-end manner. Comprehensive experiments on ActivityNet Captions and TACoS datasets demonstrate both the effectiveness and the efficiency of the proposed ABLR approach.

READ FULL TEXT

page 2

page 6

research
03/22/2021

Context-aware Biaffine Localizing Network for Temporal Sentence Grounding

This paper addresses the problem of temporal sentence grounding (TSG), w...
research
10/12/2021

Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos

This paper focuses on tackling the problem of temporal language localiza...
research
10/31/2021

Hierarchical Deep Residual Reasoning for Temporal Moment Localization

Temporal Moment Localization (TML) in untrimmed videos is a challenging ...
research
08/31/2020

Sentence Guided Temporal Modulation for Dynamic Video Thumbnail Generation

We consider the problem of sentence specified dynamic video thumbnail ge...
research
08/31/2019

WSLLN: Weakly Supervised Natural Language Localization Networks

We propose weakly supervised language localization networks (WSLLN) to d...
research
04/03/2018

AttnConvnet at SemEval-2018 Task 1: Attention-based Convolutional Neural Networks for Multi-label Emotion Classification

In this paper, we propose an attention-based classifier that predict mul...
research
06/24/2019

CORAL8: Concurrent Object Regression for Area Localization in Medical Image Panels

This work tackles the problem of generating a medical report for multi-i...

Please sign up or login with your details

Forgot password? Click here to reset