Video Fill in the Blank with Merging LSTMs

10/13/2016
by   Amir Mazaheri, et al.
0

Given a video and its incomplete textural description with missing words, the Video-Fill-in-the-Blank (ViFitB) task is to automatically find the missing word. The contextual information of the sentences are important to infer the missing words; the visual cues are even more crucial to get a more accurate inference. In this paper, we presents a new method which intuitively takes advantage of the structure of the sentences and employs merging LSTMs (to merge two LSTMs) to tackle the problem with embedded textural and visual cues. In the experiments, we have demonstrated the superior performance of the proposed method on the challenging "Movie Fill-in-the-Blank" dataset.

READ FULL TEXT
research
05/03/2015

Sequence to Sequence -- Video to Text

Real-world videos often have complex dynamics; and methods for generatin...
research
12/08/2018

Attend More Times for Image Captioning

Most attention-based image captioning models attend to the image once pe...
research
12/26/2018

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Recent progress has been made in using attention based encoder-decoder f...
research
12/03/2020

BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling

Visual storytelling is a creative and challenging task, aiming to automa...
research
04/25/2018

Movie Question Answering: Remembering the Textual Cues for Layered Visual Contents

Movies provide us with a mass of visual content as well as attracting st...
research
03/05/2023

CueCAn: Cue Driven Contextual Attention For Identifying Missing Traffic Signs on Unconstrained Roads

Unconstrained Asian roads often involve poor infrastructure, affecting o...
research
05/24/2018

WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration

The problem of word sense disambiguation (WSD) is considered in the arti...

Please sign up or login with your details

Forgot password? Click here to reset