A Hierarchical Approach for Visual Storytelling Using Image Description

09/26/2019
by   Md Sultan Al Nahian, et al.
0

One of the primary challenges of visual storytelling is developing techniques that can maintain the context of the story over long event sequences to generate human-like stories. In this paper, we propose a hierarchical deep learning architecture based on encoder-decoder networks to address this problem. To better help our network maintain this context while also generating long and diverse sentences, we incorporate natural language image descriptions along with the images themselves to generate each story sentence. We evaluate our system on the Visual Storytelling (VIST) dataset and show that our method outperforms state-of-the-art techniques on a suite of different automatic evaluation metrics. The empirical results from this evaluation demonstrate the necessities of different components of our proposed architecture and shows the effectiveness of the architecture for visual storytelling.

READ FULL TEXT

page 11

page 12

research
05/15/2018

Stories for Images-in-Sequence by using Visual and Narrative Components

Recent research in AI is focusing towards generating narrative stories a...
research
12/03/2020

BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling

Visual storytelling is a creative and challenging task, aiming to automa...
research
10/22/2022

EtriCA: Event-Triggered Context-Aware Story Generation Augmented by Cross Attention

One of the key challenges of automatic story generation is how to genera...
research
11/23/2022

Make-A-Story: Visual Memory Conditioned Consistent Story Generation

There has been a recent explosion of impressive generative models that c...
research
06/03/2018

Contextualize, Show and Tell: A Neural Visual Storyteller

We present a neural model for generating short stories from image sequen...
research
04/13/2016

Visual Storytelling

We introduce the first dataset for sequential vision-to-language, and ex...
research
04/07/2017

Egocentric Video Description based on Temporally-Linked Sequences

Egocentric vision consists in acquiring images along the day from a firs...

Please sign up or login with your details

Forgot password? Click here to reset