Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling

02/03/2020
by   Yunjae Jung, et al.
9

Visual storytelling is a task of creating a short story based on photo streams. Unlike existing visual captioning, storytelling aims to contain not only factual descriptions, but also human-like narration and semantics. However, the VIST dataset consists only of a small, fixed number of photos per story. Therefore, the main challenge of visual storytelling is to fill in the visual gap between photos with narrative and imaginative story. In this paper, we propose to explicitly learn to imagine a storyline that bridges the visual gap. During training, one or more photos is randomly omitted from the input stack, and we train the network to produce a full plausible story even with missing photo(s). Furthermore, we propose for visual storytelling a hide-and-tell model, which is designed to learn non-local relations across the photo streams and to refine and improve conventional RNN-based models. In experiments, we show that our scheme of hide-and-tell, and the network design are indeed effective at storytelling, and that our model outperforms previous state-of-the-art methods in automatic metrics. Finally, we qualitatively show the learned ability to interpolate storyline over visual gaps.

READ FULL TEXT

page 1

page 3

page 6

page 7

research
06/02/2016

Storytelling of Photo Stream with Bidirectional Multi-thread Recurrent Neural Network

Visual storytelling aims to generate human-level narrative language (i.e...
research
08/09/2017

Hierarchically-Attentive RNN for Album Summarization and Storytelling

We address the problem of end-to-end visual storytelling. Given a photo ...
research
03/06/2019

Dixit: Interactive Visual Storytelling via Term Manipulation

In this paper, we introduceDixit, an interactive visual storytelling sys...
research
04/14/2016

Learning Visual Storylines with Skipping Recurrent Neural Networks

What does a typical visit to Paris look like? Do people first take photo...
research
07/24/2018

Slots-Memento : A System Facilitating Intergenerational Story Sharing and Preservation of Family Mementos

Family mementos document events shaping family life, telling a story wit...
research
05/03/2018

The Effect of Computer-Generated Descriptions on Photo-Sharing Experiences of People with Visual Impairments

Like sighted people, visually impaired people want to share photographs ...
research
04/01/2020

Background Matting: The World is Your Green Screen

We propose a method for creating a matte – the per-pixel foreground colo...

Please sign up or login with your details

Forgot password? Click here to reset