InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions

05/29/2023
by   Qian Wang, et al.
0

Recent works have explored text-guided image editing using diffusion models and generated edited images based on text prompts. However, the models struggle to accurately locate the regions to be edited and faithfully perform precise edits. In this work, we propose a framework termed InstructEdit that can do fine-grained editing based on user instructions. Our proposed framework has three components: language processor, segmenter, and image editor. The first component, the language processor, processes the user instruction using a large language model. The goal of this processing is to parse the user instruction and output prompts for the segmenter and captions for the image editor. We adopt ChatGPT and optionally BLIP2 for this step. The second component, the segmenter, uses the segmentation prompt provided by the language processor. We employ a state-of-the-art segmentation framework Grounded Segment Anything to automatically generate a high-quality mask based on the segmentation prompt. The third component, the image editor, uses the captions from the language processor and the masks from the segmenter to compute the edited image. We adopt Stable Diffusion and the mask-guided generation from DiffEdit for this purpose. Experiments show that our method outperforms previous editing methods in fine-grained editing applications where the input image contains a complex object or multiple objects. We improve the mask quality over DiffEdit and thus improve the quality of edited images. We also show that our framework can accept multiple forms of user instructions as input. We provide the code at https://github.com/QianWangX/InstructEdit.

READ FULL TEXT

page 1

page 7

page 8

page 9

page 16

page 17

page 18

page 19

research
11/17/2022

InstructPix2Pix: Learning to Follow Image Editing Instructions

We propose a method for editing images from human instructions: given an...
research
09/20/2023

XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates

Text editing is a crucial task that involves modifying text to better al...
research
07/17/2023

Multimodal Diffusion Segmentation Model for Object Segmentation from Manipulation Instructions

In this study, we aim to develop a model that comprehends a natural lang...
research
06/16/2023

MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing

Text-guided image editing is widely needed in daily life, ranging from p...
research
07/25/2023

Fashion Matrix: Editing Photos by Just Talking

The utilization of Large Language Models (LLMs) for the construction of ...
research
05/09/2023

InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

We present an interactive visual framework named InternGPT, or iGPT for ...
research
12/15/2022

Text-guided mask-free local image retouching

In the realm of multi-modality, text-guided image retouching techniques ...

Please sign up or login with your details

Forgot password? Click here to reset