Improving Diffusion Models for Scene Text Editing with Dual Encoders

04/12/2023
by   Jiabao Ji, et al.
0

Scene text editing is a challenging task that involves modifying or inserting specified texts in an image while maintaining its natural and realistic appearance. Most previous approaches to this task rely on style-transfer models that crop out text regions and feed them into image transfer models, such as GANs. However, these methods are limited in their ability to change text style and are unable to insert texts into images. Recent advances in diffusion models have shown promise in overcoming these limitations with text-conditional image editing. However, our empirical analysis reveals that state-of-the-art diffusion models struggle with rendering correct text and controlling text style. To address these problems, we propose DIFFSTE to improve pre-trained diffusion models with a dual encoder design, which includes a character encoder for better text legibility and an instruction encoder for better style control. An instruction tuning framework is introduced to train our model to learn the mapping from the text instruction to the corresponding image with either the specified style or the style of the surrounding texts in the background. Such a training method further brings our method the zero-shot generalization ability to the following three scenarios: generating text with unseen font variation, e.g., italic and bold, mixing different fonts to construct a new font, and using more relaxed forms of natural language as the instructions to guide the generation task. We evaluate our approach on five datasets and demonstrate its superior performance in terms of text correctness, image naturalness, and style controllability. Our code is publicly available. https://github.com/UCSB-NLP-Chang/DiffSTE

READ FULL TEXT

page 5

page 6

page 15

page 17

page 18

page 19

page 21

page 22

research
04/28/2022

Russian Texts Detoxification with Levenshtein Editing

Text detoxification is a style transfer task of creating neutral version...
research
05/21/2023

InstructVid2Vid: Controllable Video Editing with Natural Language Instructions

We present an end-to-end diffusion-based method for editing videos with ...
research
05/18/2023

DiffUTE: Universal Text Editing Diffusion Model

Diffusion model based language-guided image editing has achieved great s...
research
05/09/2023

Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style Transfer

Large-scale text-to-video diffusion models have demonstrated an exceptio...
research
04/24/2023

Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model

The immense scale of the recent large language models (LLM) allows many ...
research
10/05/2021

De-rendering Stylized Texts

Editing raster text is a promising but challenging task. We propose to a...
research
03/13/2023

Erasing Concepts from Diffusion Models

Motivated by recent advancements in text-to-image diffusion, we study er...

Please sign up or login with your details

Forgot password? Click here to reset