Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models

07/26/2022
by   Robin Rombach, et al.
0

Novel architectures have recently improved generative image synthesis leading to excellent visual quality in various tasks. Of particular note is the field of “AI-Art”, which has seen unprecedented growth with the emergence of powerful multimodal models such as CLIP. By combining speech and image synthesis models, so-called “prompt-engineering” has become established, in which carefully selected and composed sentences are used to achieve a certain visual style in the synthesized image. In this note, we present an alternative approach based on retrieval-augmented diffusion models (RDMs). In RDMs, a set of nearest neighbors is retrieved from an external database during training for each training instance, and the diffusion model is conditioned on these informative samples. During inference (sampling), we replace the retrieval database with a more specialized database that contains, for example, only images of a particular visual style. This provides a novel way to prompt a general trained model after training and thereby specify a particular visual style. As shown by our experiments, this approach is superior to specifying the visual style within the text prompt. We open-source code and model weights at https://github.com/CompVis/latent-diffusion .

READ FULL TEXT

page 1

page 4

page 7

page 10

research
04/25/2022

Retrieval-Augmented Diffusion Models

Generative image synthesis with diffusion models has recently achieved e...
research
08/15/2023

SGDiff: A Style Guided Diffusion Model for Fashion Synthesis

This paper reports on the development of a novel style guided diffusion ...
research
07/04/2023

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

We present SDXL, a latent diffusion model for text-to-image synthesis. C...
research
06/13/2023

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

In this paper, we present StyleTTS 2, a text-to-speech (TTS) model that ...
research
03/11/2023

PARASOL: Parametric Style Control for Diffusion Image Synthesis

We propose PARASOL, a multi-modal synthesis model that enables disentang...
research
06/01/2022

DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder

Recently most successful image synthesis models are multi stage process ...
research
02/08/2023

GLAZE: Protecting Artists from Style Mimicry by Text-to-Image Models

Recent text-to-image diffusion models such as MidJourney and Stable Diff...

Please sign up or login with your details

Forgot password? Click here to reset