Conditional Text Image Generation with Diffusion Models

06/19/2023
by   Yuanzhi Zhu, et al.
0

Current text recognition systems, including those for handwritten scripts and scene text, have relied heavily on image synthesis and augmentation, since it is difficult to realize real-world complexity and diversity through collecting and annotating enough real text images. In this paper, we explore the problem of text image generation, by taking advantage of the powerful abilities of Diffusion Models in generating photo-realistic and diverse image samples with given conditions, and propose a method called Conditional Text Image Generation with Diffusion Models (CTIG-DM for short). To conform to the characteristics of text images, we devise three conditions: image condition, text condition, and style condition, which can be used to control the attributes, contents, and styles of the samples in the image generation process. Specifically, four text image generation modes, namely: (1) synthesis mode, (2) augmentation mode, (3) recovery mode, and (4) imitation mode, can be derived by combining and configuring these three conditions. Extensive experiments on both handwritten and scene text demonstrate that the proposed CTIG-DM is able to produce image samples that simulate real-world complexity and diversity, and thus can boost the performance of existing text recognizers. Besides, CTIG-DM shows its appealing potential in domain adaptation and generating images containing Out-Of-Vocabulary (OOV) words.

READ FULL TEXT
research
03/29/2023

WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models

Text-to-Image synthesis is the task of generating an image according to ...
research
11/14/2022

Arbitrary Style Guidance for Enhanced Diffusion-Based Text-to-Image Generation

Diffusion-based text-to-image generation models like GLIDE and DALLE-2 h...
research
05/29/2023

Controllable Text-to-Image Generation with GPT-4

Current text-to-image generation models often struggle to follow textual...
research
04/13/2022

Hierarchical Text-Conditional Image Generation with CLIP Latents

Contrastive models like CLIP have been shown to learn robust representat...
research
04/13/2023

Expressive Text-to-Image Generation with Rich Text

Plain text has become a prevalent interface for text-to-image synthesis....
research
06/26/2023

Decompose and Realign: Tackling Condition Misalignment in Text-to-Image Diffusion Models

Text-to-image diffusion models have advanced towards more controllable g...
research
05/31/2023

Improving Handwritten OCR with Training Samples Generated by Glyph Conditional Denoising Diffusion Probabilistic Model

Constructing a highly accurate handwritten OCR system requires large amo...

Please sign up or login with your details

Forgot password? Click here to reset