Recurrent Memory Addressing for describing videos

11/20/2016
by   Arnav Kumar Jain, et al.
0

In this paper, we introduce Key-Value Memory Networks to a multimodal setting and a novel key-addressing mechanism to deal with sequence-to-sequence models. The proposed model naturally decomposes the problem of video captioning into vision and language segments, dealing with them as key-value pairs. More specifically, we learn a semantic embedding (v) corresponding to each frame (k) in the video, thereby creating (k, v) memory slots. We propose to find the next step attention weights conditioned on the previous attention distributions for the key-value memory slots in the memory addressing schema. Exploiting this flexibility of the framework, we additionally capture spatial dependencies while mapping from the visual to semantic embedding. Experiments done on the Youtube2Text dataset demonstrate usefulness of recurrent key-addressing, while achieving competitive scores on BLEU@4, METEOR metrics against state-of-the-art models.

READ FULL TEXT

page 1

page 7

research
11/17/2016

Multimodal Memory Modelling for Video Captioning

Video captioning which automatically translates video clips into natural...
research
05/08/2019

Multimodal Semantic Attention Network for Video Captioning

Inspired by the fact that different modalities in videos carry complemen...
research
05/10/2019

Memory-Attended Recurrent Network for Video Captioning

Typical techniques for video captioning follow the encoder-decoder frame...
research
01/02/2021

Video Captioning in Compressed Video

Existing approaches in video captioning concentrate on exploring global ...
research
03/13/2022

Global2Local: A Joint-Hierarchical Attention for Video Captioning

Recently, automatic video captioning has attracted increasing attention,...
research
06/28/2019

ARMIN: Towards a More Efficient and Light-weight Recurrent Memory Network

In recent years, memory-augmented neural networks(MANNs) have shown prom...
research
07/28/2023

Universal Recurrent Event Memories for Streaming Data

In this paper, we propose a new event memory architecture (MemNet) for r...

Please sign up or login with your details

Forgot password? Click here to reset