Zero-shot Generation of Coherent Storybook from Plain Text Story using Diffusion Models

02/08/2023
by   Hyeonho Jeong, et al.
0

Recent advancements in large scale text-to-image models have opened new possibilities for guiding the creation of images through human-devised natural language. However, while prior literature has primarily focused on the generation of individual images, it is essential to consider the capability of these models to ensure coherency within a sequence of images to fulfill the demands of real-world applications such as storytelling. To address this, here we present a novel neural pipeline for generating a coherent storybook from the plain text of a story. Specifically, we leverage a combination of a pre-trained Large Language Model and a text-guided Latent Diffusion Model to generate coherent images. While previous story synthesis frameworks typically require a large-scale text-to-image model trained on expensive image-caption pairs to maintain the coherency, we employ simple textual inversion techniques along with detector-based semantic image editing which allows zero-shot generation of the coherent storybook. Experimental results show that our proposed method outperforms state-of-the-art image editing baselines.

READ FULL TEXT

page 1

page 4

page 5

page 7

page 8

page 11

page 12

page 13

research
08/30/2023

Zero-shot Inversion Process for Image Attribute Editing with Diffusion Models

Denoising diffusion models have shown outstanding performance in image e...
research
11/20/2022

Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models

Conditioned diffusion models have demonstrated state-of-the-art text-to-...
research
09/18/2023

Progressive Text-to-Image Diffusion with Soft Latent Direction

In spite of the rapidly evolving landscape of text-to-image generation, ...
research
11/21/2019

Incorporating Textual Evidence in Visual Storytelling

Previous work on visual storytelling mainly focused on exploring image s...
research
07/13/2023

AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion

Large-scale pre-trained vision-language models allow for the zero-shot t...
research
06/16/2023

Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models

Despite the remarkable performance of text-to-image diffusion models in ...
research
08/14/2023

Jurassic World Remake: Bringing Ancient Fossils Back to Life via Zero-Shot Long Image-to-Image Translation

With a strong understanding of the target domain from natural language, ...

Please sign up or login with your details

Forgot password? Click here to reset