Expressive Text-to-Image Generation with Rich Text

04/13/2023
by   Songwei Ge, et al.
0

Plain text has become a prevalent interface for text-to-image synthesis. However, its limited customization options hinder users from accurately describing desired outputs. For example, plain text makes it hard to specify continuous quantities, such as the precise RGB color value or importance of each word. Furthermore, creating detailed text prompts for complex scenes is tedious for humans to write and challenging for text encoders to interpret. To address these challenges, we propose using a rich-text editor supporting formats such as font style, size, color, and footnote. We extract each word's attributes from rich text to enable local style control, explicit token reweighting, precise color rendering, and detailed region synthesis. We achieve these capabilities through a region-based diffusion process. We first obtain each word's region based on cross-attention maps of a vanilla diffusion process using plain text. For each region, we enforce its text attributes by creating region-specific detailed prompts and applying region-specific guidance. We present various examples of image generation from rich text and demonstrate that our method outperforms strong baselines with quantitative evaluations.

READ FULL TEXT

page 6

page 13

page 16

page 18

page 19

page 20

page 21

page 24

research
11/14/2022

Arbitrary Style Guidance for Enhanced Diffusion-Based Text-to-Image Generation

Diffusion-based text-to-image generation models like GLIDE and DALLE-2 h...
research
03/01/2023

Collage Diffusion

Text-conditional diffusion models generate high-quality, diverse images....
research
04/05/2022

DT2I: Dense Text-to-Image Generation from Region Descriptions

Despite astonishing progress, generating realistic images of complex sce...
research
06/19/2023

Conditional Text Image Generation with Diffusion Models

Current text recognition systems, including those for handwritten script...
research
11/04/2022

Rickrolling the Artist: Injecting Invisible Backdoors into Text-Guided Image Generation Models

While text-to-image synthesis currently enjoys great popularity among re...
research
11/19/2022

DiffStyler: Controllable Dual Diffusion for Text-Driven Image Stylization

Despite the impressive results of arbitrary image-guided style transfer ...
research
03/30/2021

FONTNET: On-Device Font Understanding and Prediction Pipeline

Fonts are one of the most basic and core design concepts. Numerous use c...

Please sign up or login with your details

Forgot password? Click here to reset