Sketch-Guided Text-to-Image Diffusion Models

11/24/2022
by   Andrey Voynov, et al.
0

Text-to-Image models have introduced a remarkable leap in the evolution of machine learning, demonstrating high-quality synthesis of images from a given text-prompt. However, these powerful pretrained models still lack control handles that can guide spatial properties of the synthesized images. In this work, we introduce a universal approach to guide a pretrained text-to-image diffusion model, with a spatial map from another domain (e.g., sketch) during inference time. Unlike previous works, our method does not require to train a dedicated model or a specialized encoder for the task. Our key idea is to train a Latent Guidance Predictor (LGP) - a small, per-pixel, Multi-Layer Perceptron (MLP) that maps latent features of noisy images to spatial maps, where the deep features are extracted from the core Denoising Diffusion Probabilistic Model (DDPM) network. The LGP is trained only on a few thousand images and constitutes a differential guiding map predictor, over which the loss is computed and propagated back to push the intermediate images to agree with the spatial map. The per-pixel training offers flexibility and locality which allows the technique to perform well on out-of-domain sketches, including free-hand style drawings. We take a particular focus on the sketch-to-image translation task, revealing a robust and expressive way to generate images that follow the guidance of a sketch of arbitrary style or domain. Project page: sketch-guided-diffusion.github.io

READ FULL TEXT

page 5

page 7

page 8

page 11

page 12

page 13

page 14

page 15

research
02/14/2023

Text-Guided Scene Sketch-to-Photo Synthesis

We propose a method for scene-level sketch-to-photo synthesis with text ...
research
02/14/2023

DiffFaceSketch: High-Fidelity Face Image Synthesis with Sketch-Guided Latent Diffusion Model

Synthesizing face images from monochrome sketches is one of the most fun...
research
05/11/2023

Null-text Guidance in Diffusion Models is Secretly a Cartoon-style Creator

Classifier-free guidance is an effective sampling technique in diffusion...
research
02/05/2023

Design Booster: A Text-Guided Diffusion Model for Image Translation with Spatial Layout Preservation

Diffusion models are able to generate photorealistic images in arbitrary...
research
02/28/2023

Towards Enhanced Controllability of Diffusion Models

Denoising Diffusion models have shown remarkable capabilities in generat...
research
08/27/2023

SketchDreamer: Interactive Text-Augmented Creative Sketch Ideation

Artificial Intelligence Generated Content (AIGC) has shown remarkable pr...
research
02/20/2023

Composer: Creative and Controllable Image Synthesis with Composable Conditions

Recent large-scale generative models learned on big data are capable of ...

Please sign up or login with your details

Forgot password? Click here to reset