ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation

08/02/2023
by   Yasheng Sun, et al.
0

While language-guided image manipulation has made remarkable progress, the challenge of how to instruct the manipulation process faithfully reflecting human intentions persists. An accurate and comprehensive description of a manipulation task using natural language is laborious and sometimes even impossible, primarily due to the inherent uncertainty and ambiguity present in linguistic expressions. Is it feasible to accomplish image manipulation without resorting to external cross-modal language information? If this possibility exists, the inherent modality gap would be effortlessly eliminated. In this paper, we propose a novel manipulation methodology, dubbed ImageBrush, that learns visual instructions for more accurate image editing. Our key idea is to employ a pair of transformation images as visual instructions, which not only precisely captures human intention but also facilitates accessibility in real-world scenarios. Capturing visual instructions is particularly challenging because it involves extracting the underlying intentions solely from visual demonstrations and then applying this operation to a new image. To address this challenge, we formulate visual instruction learning as a diffusion-based inpainting problem, where the contextual information is fully exploited through an iterative process of generation. A visual prompting encoder is carefully devised to enhance the model's capacity in uncovering human intent behind the visual instructions. Extensive experiments show that our method generates engaging manipulation results conforming to the transformations entailed in demonstrations. Moreover, our model exhibits robust generalization capabilities on various downstream tasks such as pose transfer, image translation and video inpainting.

READ FULL TEXT

page 1

page 4

page 8

page 16

page 17

page 18

page 19

page 20

research
03/06/2023

Naming Objects for Vision-and-Language Manipulation

Robot manipulation tasks by natural language instructions need common un...
research
06/26/2023

Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

Despite the promising progress in multi-modal tasks, current large multi...
research
06/01/2017

Grounding Symbols in Multi-Modal Instructions

As robots begin to cohabit with humans in semi-structured environments, ...
research
04/02/2022

IR-GAN: Image Manipulation with Linguistic Instruction by Increment Reasoning

Conditional image generation is an active research topic including text2...
research
07/12/2023

GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation

Language-Guided Robotic Manipulation (LGRM) is a challenging task as it ...
research
08/12/2018

Language Guided Fashion Image Manipulation with Feature-wise Transformations

Developing techniques for editing an outfit image through natural senten...
research
09/21/2020

SSCR: Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reasoning

Iterative Language-Based Image Editing (IL-BIE) tasks follow iterative i...

Please sign up or login with your details

Forgot password? Click here to reset