Album Storytelling with Iterative Story-aware Captioning and Large Language Models

05/22/2023
by   Munan Ning, et al.
0

This work studies how to transform an album to vivid and coherent stories, a task we refer to as "album storytelling". While this task can help preserve memories and facilitate experience sharing, it remains an underexplored area in current literature. With recent advances in Large Language Models (LLMs), it is now possible to generate lengthy, coherent text, opening up the opportunity to develop an AI assistant for album storytelling. One natural approach is to use caption models to describe each photo in the album, and then use LLMs to summarize and rewrite the generated captions into an engaging story. However, we find this often results in stories containing hallucinated information that contradicts the images, as each generated caption ("story-agnostic") is not always about the description related to the whole story or miss some necessary information. To address these limitations, we propose a new iterative album storytelling pipeline. Specifically, we start with an initial story and build a story-aware caption model to refine the captions using the whole story as guidance. The polished captions are then fed into the LLMs to generate a new refined story. This process is repeated iteratively until the story contains minimal factual errors while maintaining coherence. To evaluate our proposed pipeline, we introduce a new dataset of image collections from vlogs and a set of systematic evaluation metrics. Our results demonstrate that our method effectively generates more accurate and engaging stories for albums, with enhanced coherence and vividness.

READ FULL TEXT

page 9

page 17

page 18

page 19

page 20

page 21

page 22

page 23

research
12/16/2022

Neural Story Planning

Automated plot generation is the challenge of generating a sequence of e...
research
10/31/2018

Picking Apart Story Salads

During natural disasters and conflicts, information about what happened ...
research
06/23/2016

Sort Story: Sorting Jumbled Images and Captions into Stories

Temporal common sense has applications in AI tasks such as QA, multi-doc...
research
12/10/2021

Unsupervised Editing for Counterfactual Stories

Creating what-if stories requires reasoning about prior statements and p...
research
05/14/2021

Plot and Rework: Modeling Storylines for Visual Storytelling

Writing a coherent and engaging story is not easy. Creative writers use ...
research
11/20/2022

Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models

Conditioned diffusion models have demonstrated state-of-the-art text-to-...
research
09/18/2023

Causal-Story: Local Causal Attention Utilizing Parameter-Efficient Tuning For Visual Story Synthesis

The excellent text-to-image synthesis capability of diffusion models has...

Please sign up or login with your details

Forgot password? Click here to reset