The Stable Artist: Steering Semantics in Diffusion Latent Space

12/12/2022
by   Manuel Brack, et al.
6

Large, text-conditioned generative diffusion models have recently gained a lot of attention for their impressive performance in generating high-fidelity images from text alone. However, achieving high-quality results is almost unfeasible in a one-shot fashion. On the contrary, text-guided image generation involves the user making many slight changes to inputs in order to iteratively carve out the envisioned image. However, slight changes to the input prompt often lead to entirely different images being generated, and thus the control of the artist is limited in its granularity. To provide flexibility, we present the Stable Artist, an image editing approach enabling fine-grained control of the image generation process. The main component is semantic guidance (SEGA) which steers the diffusion process along variable numbers of semantic directions. This allows for subtle edits to images, changes in composition and style, as well as optimization of the overall artistic conception. Furthermore, SEGA enables probing of latent spaces to gain insights into the representation of concepts learned by the model, even complex ones such as 'carbon emission'. We demonstrate the Stable Artist on several tasks, showcasing high-quality image editing and composition.

READ FULL TEXT

page 1

page 3

page 4

page 6

page 7

page 8

research
01/28/2023

SEGA: Instructing Diffusion using Semantic Dimensions

Text-to-image diffusion models have recently received a lot of interest ...
research
05/30/2023

Real-World Image Variation by Aligning Diffusion Inversion Chain

Recent diffusion model advancements have enabled high-fidelity images to...
research
08/07/2023

AvatarVerse: High-quality Stable 3D Avatar Creation from Text and Pose

Creating expressive, diverse and high-quality 3D avatars from highly cus...
research
08/23/2023

Manipulating Embeddings of Stable Diffusion Prompts

Generative text-to-image models such as Stable Diffusion allow users to ...
research
09/19/2022

The Biased Artist: Exploiting Cultural Biases via Homoglyphs in Text-Guided Image Generation Models

Text-guided image generation models, such as DALL-E 2 and Stable Diffusi...
research
11/30/2022

High-Fidelity Guided Image Synthesis with Latent Diffusion Models

Controllable image synthesis with user scribbles has gained huge public ...
research
05/05/2023

Guided Image Synthesis via Initial Image Editing in Diffusion Model

Diffusion models have the ability to generate high quality images by den...

Please sign up or login with your details

Forgot password? Click here to reset