Storytelling of Photo Stream with Bidirectional Multi-thread Recurrent Neural Network

06/02/2016
by   Yu Liu, et al.
0

Visual storytelling aims to generate human-level narrative language (i.e., a natural paragraph with multiple sentences) from a photo streams. A typical photo story consists of a global timeline with multi-thread local storylines, where each storyline occurs in one different scene. Such complex structure leads to large content gaps at scene transitions between consecutive photos. Most existing image/video captioning methods can only achieve limited performance, because the units in traditional recurrent neural networks (RNN) tend to "forget" the previous state when the visual sequence is inconsistent. In this paper, we propose a novel visual storytelling approach with Bidirectional Multi-thread Recurrent Neural Network (BMRNN). First, based on the mined local storylines, a skip gated recurrent unit (sGRU) with delay control is proposed to maintain longer range visual information. Second, by using sGRU as basic units, the BMRNN is trained to align the local storylines into the global sequential timeline. Third, a new training scheme with a storyline-constrained objective function is proposed by jointly considering both global and local matches. Experiments on three standard storytelling datasets show that the BMRNN model outperforms the state-of-the-art methods.

READ FULL TEXT
research
02/03/2020

Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling

Visual storytelling is a task of creating a short story based on photo s...
research
04/14/2016

Learning Visual Storylines with Skipping Recurrent Neural Networks

What does a typical visit to Paris look like? Do people first take photo...
research
11/22/2016

Scene Labeling using Gated Recurrent Units with Explicit Long Range Conditioning

Recurrent neural network (RNN), as a powerful contextual dependency mode...
research
12/20/2014

Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

In this paper, we present a multimodal Recurrent Neural Network (m-RNN) ...
research
02/02/2019

Hierarchical Photo-Scene Encoder for Album Storytelling

In this paper, we propose a novel model with a hierarchical photo-scene ...
research
11/11/2015

Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning

Recently, deep learning approach, especially deep Convolutional Neural N...
research
08/09/2017

Hierarchically-Attentive RNN for Album Summarization and Storytelling

We address the problem of end-to-end visual storytelling. Given a photo ...

Please sign up or login with your details

Forgot password? Click here to reset