Implementing and Experimenting with Diffusion Models for Text-to-Image Generation

09/22/2022
by   Robin Zbinden, et al.
0

Taking advantage of the many recent advances in deep learning, text-to-image generative models currently have the merit of attracting the general public attention. Two of these models, DALL-E 2 and Imagen, have demonstrated that highly photorealistic images could be generated from a simple textual description of an image. Based on a novel approach for image generation called diffusion models, text-to-image models enable the production of many different types of high resolution images, where human imagination is the only limit. However, these models require exceptionally large amounts of computational resources to train, as well as handling huge datasets collected from the internet. In addition, neither the codebase nor the models have been released. It consequently prevents the AI community from experimenting with these cutting-edge models, making the reproduction of their results complicated, if not impossible. In this thesis, we aim to contribute by firstly reviewing the different approaches and techniques used by these models, and then by proposing our own implementation of a text-to-image model. Highly based on DALL-E 2, we introduce several slight modifications to tackle the high computational cost induced. We thus have the opportunity to experiment in order to understand what these models are capable of, especially in a low resource regime. In particular, we provide additional and analyses deeper than the ones performed by the authors of DALL-E 2, including ablation studies. Besides, diffusion models use so-called guidance methods to help the generating process. We introduce a new guidance method which can be used in conjunction with other guidance methods to improve image quality. Finally, the images generated by our model are of reasonably good quality, without having to sustain the significant training costs of state-of-the-art text-to-image models.

READ FULL TEXT

page 14

page 17

page 25

page 27

page 28

page 31

page 41

page 42

research
06/01/2023

Diffusion Self-Guidance for Controllable Image Generation

Large-scale generative models are capable of producing high-quality imag...
research
05/18/2023

X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models

This paper introduces a novel explainable image quality evaluation appro...
research
09/19/2022

The Biased Artist: Exploiting Cultural Biases via Homoglyphs in Text-Guided Image Generation Models

Text-guided image generation models, such as DALL-E 2 and Stable Diffusi...
research
08/18/2023

Guide3D: Create 3D Avatars from Text and Image Guidance

Recently, text-to-image generation has exhibited remarkable advancements...
research
11/09/2022

Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models

Text-conditioned image generation models have recently achieved astonish...
research
09/23/2022

Best Prompts for Text-to-Image Models and How to Find Them

Recent progress in generative models, especially in text-guided diffusio...
research
06/05/2023

Composition and Deformance: Measuring Imageability with a Text-to-Image Model

Although psycholinguists and psychologists have long studied the tendenc...

Please sign up or login with your details

Forgot password? Click here to reset