Causal-Story: Local Causal Attention Utilizing Parameter-Efficient Tuning For Visual Story Synthesis

09/18/2023
by   Tianyi Song, et al.
0

The excellent text-to-image synthesis capability of diffusion models has driven progress in synthesizing coherent visual stories. The current state-of-the-art method combines the features of historical captions, historical frames, and the current captions as conditions for generating the current frame. However, this method treats each historical frame and caption as the same contribution. It connects them in order with equal weights, ignoring that not all historical conditions are associated with the generation of the current frame. To address this issue, we propose Causal-Story. This model incorporates a local causal attention mechanism that considers the causal relationship between previous captions, frames, and current captions. By assigning weights based on this relationship, Causal-Story generates the current frame, thereby improving the global consistency of story generation. We evaluated our model on the PororoSV and FlintstonesSV datasets and obtained state-of-the-art FID scores, and the generated frames also demonstrate better storytelling in visuals.

READ FULL TEXT

page 1

page 3

page 4

research
11/20/2022

Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models

Conditioned diffusion models have demonstrated state-of-the-art text-to-...
research
05/26/2023

Improved Visual Story Generation with Adaptive Context Modeling

Diffusion models developed on top of powerful text-to-image generation m...
research
09/13/2022

StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation

Recent advances in text-to-image synthesis have led to large pretrained ...
research
05/22/2023

Album Storytelling with Iterative Story-aware Captioning and Large Language Models

This work studies how to transform an album to vivid and coherent storie...
research
05/28/2018

GLAC Net: GLocal Attention Cascading Networks for Multi-image Cued Story Generation

The task of multi-image cued story generation, such as visual storytelli...
research
11/23/2022

Make-A-Story: Visual Memory Conditioned Consistent Story Generation

There has been a recent explosion of impressive generative models that c...
research
12/10/2021

Unsupervised Editing for Counterfactual Stories

Creating what-if stories requires reasoning about prior statements and p...

Please sign up or login with your details

Forgot password? Click here to reset