Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions

08/04/2020
by   Xihui Liu, et al.
13

We propose a novel algorithm, named Open-Edit, which is the first attempt on open-domain image manipulation with open-vocabulary instructions. It is a challenging task considering the large variation of image domains and the lack of training supervision. Our approach takes advantage of the unified visual-semantic embedding space pretrained on a general image-caption dataset, and manipulates the embedded visual features by applying text-guided vector arithmetic on the image feature maps. A structure-preserving image decoder then generates the manipulated images from the manipulated feature maps. We further propose an on-the-fly sample-specific optimization approach with cycle-consistency constraints to regularize the manipulated images and force them to preserve details of the source images. Our approach shows promising results in manipulating open-vocabulary color, texture, and high-level attributes for various scenarios of open-domain images.

READ FULL TEXT

page 11

page 12

page 14

research
04/18/2022

VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance

Generating and editing images from open domain text prompts is a challen...
research
03/24/2022

Semantic Image Manipulation with Background-guided Internal Learning

Image manipulation has attracted a lot of interest due to its wide range...
research
05/23/2023

3D Open-vocabulary Segmentation with Foundation Models

Open-vocabulary segmentation of 3D scenes is a fundamental function of h...
research
06/04/2022

Rethinking the Openness of CLIP

Contrastive Language-Image Pre-training (CLIP) has demonstrated great po...
research
03/25/2023

Learning video embedding space with Natural Language Supervision

The recent success of the CLIP model has shown its potential to be appli...
research
03/26/2017

Open Vocabulary Scene Parsing

Recognizing arbitrary objects in the wild has been a challenging problem...
research
04/09/2022

ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise Semantic Alignment and Generation

Existing text-guided image manipulation methods aim to modify the appear...

Please sign up or login with your details

Forgot password? Click here to reset