Interactive Image Manipulation with Complex Text Instructions

11/25/2022
by   Ryugo Morita, et al.
0

Recently, text-guided image manipulation has received increasing attention in the research field of multimedia processing and computer vision due to its high flexibility and controllability. Its goal is to semantically manipulate parts of an input reference image according to the text descriptions. However, most of the existing works have the following problems: (1) text-irrelevant content cannot always be maintained but randomly changed, (2) the performance of image manipulation still needs to be further improved, (3) only can manipulate descriptive attributes. To solve these problems, we propose a novel image manipulation method that interactively edits an image using complex text instructions. It allows users to not only improve the accuracy of image manipulation but also achieve complex tasks such as enlarging, dwindling, or removing objects and replacing the background with the input image. To make these tasks possible, we apply three strategies. First, the given image is divided into text-relevant content and text-irrelevant content. Only the text-relevant content is manipulated and the text-irrelevant content can be maintained. Second, a super-resolution method is used to enlarge the manipulation region to further improve the operability and to help manipulate the object itself. Third, a user interface is introduced for editing the segmentation map interactively to re-modify the generated image according to the user's desires. Extensive experiments on the Caltech-UCSD Birds-200-2011 (CUB) dataset and Microsoft Common Objects in Context (MS COCO) datasets demonstrate our proposed method can enable interactive, flexible, and accurate image manipulation in real-time. Through qualitative and quantitative evaluations, we show that the proposed model outperforms other state-of-the-art methods.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

page 7

page 8

research
08/11/2023

BATINet: Background-Aware Text to Image Synthesis and Manipulation Network

Background-Induced Text2Image (BIT2I) aims to generate foreground conten...
research
12/12/2019

ManiGAN: Text-Guided Image Manipulation

The goal of our paper is to semantically edit parts of an image to match...
research
12/28/2021

LatteGAN: Visually Guided Language Attention for Multi-Turn Text-Conditioned Image Manipulation

Text-guided image manipulation tasks have recently gained attention in t...
research
03/31/2021

StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

Inspired by the ability of StyleGAN to generate highly realistic images ...
research
11/05/2022

Disentangling Content and Motion for Text-Based Neural Video Manipulation

Giving machines the ability to imagine possible new objects or scenes fr...
research
04/09/2022

ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise Semantic Alignment and Generation

Existing text-guided image manipulation methods aim to modify the appear...
research
05/15/2020

Semantic Photo Manipulation with a Generative Image Prior

Despite the recent success of GANs in synthesizing images conditioned on...

Please sign up or login with your details

Forgot password? Click here to reset