MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing

04/17/2023
by   Mingdeng Cao, et al.
0

Despite the success in large-scale text-to-image generation and text-conditioned image editing, existing methods still struggle to produce consistent generation and editing results. For example, generation approaches usually fail to synthesize multiple images of the same objects/characters but with different views or poses. Meanwhile, existing editing methods either fail to achieve effective complex non-rigid editing while maintaining the overall textures and identity, or require time-consuming fine-tuning to capture the image-specific appearance. In this paper, we develop MasaCtrl, a tuning-free method to achieve consistent image generation and complex non-rigid image editing simultaneously. Specifically, MasaCtrl converts existing self-attention in diffusion models into mutual self-attention, so that it can query correlated local contents and textures from source images for consistency. To further alleviate the query confusion between foreground and background, we propose a mask-guided mutual self-attention strategy, where the mask can be easily extracted from the cross-attention maps. Extensive experiments show that the proposed MasaCtrl can produce impressive results in both consistent image generation and complex non-rigid real image editing.

READ FULL TEXT

page 1

page 2

page 6

page 7

page 8

page 9

page 10

page 11

research
10/20/2022

DiffEdit: Diffusion-based semantic image editing with mask guidance

Image generation has recently seen tremendous advances, with diffusion m...
research
11/15/2022

Direct Inversion: Optimization-Free Text-Driven Real Image Editing with Diffusion Models

With the rise of large, publicly-available text-to-image diffusion model...
research
06/16/2023

MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing

Text-guided image editing is widely needed in daily life, ranging from p...
research
07/01/2022

ChrSNet: Chromosome Straightening using Self-attention Guided Networks

Karyotyping is an important procedure to assess the possible existence o...
research
12/01/2022

Shape-Guided Diffusion with Inside-Outside Attention

Shape can specify key object constraints, yet existing text-to-image dif...
research
10/08/2022

Improving Fine-Grain Segmentation via Interpretable Modifications: A Case Study in Fossil Segmentation

Most interpretability research focuses on datasets containing thousands ...
research
05/29/2023

Photoswap: Personalized Subject Swapping in Images

In an era where images and visual content dominate our digital landscape...

Please sign up or login with your details

Forgot password? Click here to reset