Entity-Level Text-Guided Image Manipulation

02/22/2023
by   Yikai Wang, et al.
0

Existing text-guided image manipulation methods aim to modify the appearance of the image or to edit a few objects in a virtual or simple scenario, which is far from practical applications. In this work, we study a novel task on text-guided image manipulation on the entity level in the real world (eL-TGIM). The task imposes three basic requirements, (1) to edit the entity consistent with the text descriptions, (2) to preserve the entity-irrelevant regions, and (3) to merge the manipulated entity into the image naturally. To this end, we propose an elegant framework, dubbed as SeMani, forming the Semantic Manipulation of real-world images that can not only edit the appearance of entities but also generate new entities corresponding to the text guidance. To solve eL-TGIM, SeMani decomposes the task into two phases: the semantic alignment phase and the image manipulation phase. In the semantic alignment phase, SeMani incorporates a semantic alignment module to locate the entity-relevant region to be manipulated. In the image manipulation phase, SeMani adopts a generative model to synthesize new images conditioned on the entity-irrelevant regions and target text descriptions. We discuss and propose two popular generation processes that can be utilized in SeMani, the discrete auto-regressive generation with transformers and the continuous denoising generation with diffusion models, yielding SeMani-Trans and SeMani-Diff, respectively. We conduct extensive experiments on the real datasets CUB, Oxford, and COCO datasets to verify that SeMani can distinguish the entity-relevant and -irrelevant regions and achieve more precise and flexible manipulation in a zero-shot manner compared with baseline methods. Our codes and models will be released at https://github.com/Yikai-Wang/SeMani.

READ FULL TEXT

page 1

page 2

page 5

page 9

page 10

page 12

page 16

page 17

research
04/09/2022

ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise Semantic Alignment and Generation

Existing text-guided image manipulation methods aim to modify the appear...
research
12/12/2019

ManiGAN: Text-Guided Image Manipulation

The goal of our paper is to semantically edit parts of an image to match...
research
02/23/2023

Region-Aware Diffusion for Zero-shot Text-driven Image Editing

Image manipulation under the guidance of textual descriptions has recent...
research
11/26/2022

Target-Free Text-guided Image Manipulation

We tackle the problem of target-free text-guided image manipulation, whi...
research
02/05/2023

Divide and Compose with Score Based Generative Models

While score based generative models, or diffusion models, have found suc...
research
12/28/2021

LatteGAN: Visually Guided Language Attention for Multi-Turn Text-Conditioned Image Manipulation

Text-guided image manipulation tasks have recently gained attention in t...
research
04/13/2021

NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media

The threat of online misinformation is hard to overestimate, with advers...

Please sign up or login with your details

Forgot password? Click here to reset