Diffusion Self-Guidance for Controllable Image Generation

06/01/2023
by   Dave Epstein, et al.
0

Large-scale generative models are capable of producing high-quality images from detailed text descriptions. However, many aspects of an image are difficult or impossible to convey through text. We introduce self-guidance, a method that provides greater control over generated images by guiding the internal representations of diffusion models. We demonstrate that properties such as the shape, location, and appearance of objects can be extracted from these representations and used to steer sampling. Self-guidance works similarly to classifier guidance, but uses signals present in the pretrained model itself, requiring no additional models or training. We show how a simple set of properties can be composed to perform challenging image manipulations, such as modifying the position or size of objects, merging the appearance of objects in one image with the layout of another, composing objects from many images into one, and more. We also show that self-guidance can be used to edit real images. For results and an interactive demo, see our project page at https://dave.ml/selfguidance/

READ FULL TEXT

page 2

page 5

page 6

page 7

page 8

page 13

page 15

page 16

research
12/17/2022

DAG: Depth-Aware Guidance with Denoising Diffusion Probabilistic Models

In recent years, generative models have undergone significant advancemen...
research
10/03/2022

Improving Sample Quality of Diffusion Models Using Self-Attention Guidance

Following generative adversarial networks (GANs), a de facto standard mo...
research
04/26/2023

Training-Free Location-Aware Text-to-Image Synthesis

Current large-scale generative models have impressive efficiency in gene...
research
10/12/2022

Self-Guided Diffusion Models

Diffusion models have demonstrated remarkable progress in image generati...
research
06/01/2023

StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners

We investigate the potential of learning visual representations using sy...
research
09/22/2022

Implementing and Experimenting with Diffusion Models for Text-to-Image Generation

Taking advantage of the many recent advances in deep learning, text-to-i...
research
09/30/2022

Understanding Pure CLIP Guidance for Voxel Grid NeRF Models

We explore the task of text to 3D object generation using CLIP. Specific...

Please sign up or login with your details

Forgot password? Click here to reset