Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

03/24/2022
by   Oran Gafni, et al.
8

Recent text-to-image generation methods provide a simple yet exciting conversion capability between text and image domains. While these methods have incrementally improved the generated image fidelity and text relevancy, several pivotal gaps remain unanswered, limiting applicability and quality. We propose a novel text-to-image method that addresses these gaps by (i) enabling a simple control mechanism complementary to text in the form of a scene, (ii) introducing elements that substantially improve the tokenization process by employing domain-specific knowledge over key image regions (faces and salient objects), and (iii) adapting classifier-free guidance for the transformer use case. Our model achieves state-of-the-art FID and human evaluation results, unlocking the ability to generate high fidelity images in a resolution of 512x512 pixels, significantly improving visual quality. Through scene controllability, we introduce several new capabilities: (i) Scene editing, (ii) text editing with anchor scenes, (iii) overcoming out-of-distribution text prompts, and (iv) story illustration generation, as demonstrated in the story we wrote.

READ FULL TEXT

page 1

page 3

page 5

page 6

page 14

page 15

page 16

page 17

research
10/27/2022

ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts

Recent progress in diffusion models has revolutionized the popular techn...
research
11/25/2022

SpaText: Spatio-Textual Representation for Controllable Image Generation

Recent text-to-image diffusion models are able to generate convincing re...
research
02/17/2023

Grimm in Wonderland: Prompt Engineering with Midjourney to Illustrate Fairytales

The quality of text-to-image generation is continuously improving, yet t...
research
06/22/2023

Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields

Editing a local region or a specific object in a 3D scene represented by...
research
05/24/2022

M6-Fashion: High-Fidelity Multi-modal Image Generation and Editing

The fashion industry has diverse applications in multi-modal image gener...
research
10/22/2021

MIGS: Meta Image Generation from Scene Graphs

Generation of images from scene graphs is a promising direction towards ...
research
08/17/2023

Watch Your Steps: Local Image and Scene Editing by Text Instructions

Denoising diffusion models have enabled high-quality image generation an...

Please sign up or login with your details

Forgot password? Click here to reset