RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model

09/02/2023
by   Fengxiang Bie, et al.
0

Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions. Text-to-image generation using neural networks could be traced back to the emergence of Generative Adversial Network (GAN), followed by the autoregressive Transformer. Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps. As an effect of the impressive results of diffusion models on image synthesis, it has been cemented as the major image decoder used by text-to-image models and brought text-to-image generation to the forefront of machine-learning (ML) research. In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models, resulting the generation result nearly indistinguishable from real-world images, revolutionizing the way we retrieval images. Our explorative study has incentivised us to think that there are further ways of scaling text-to-image models with the combination of innovative model architectures and prediction enhancement techniques. We have divided the work of this survey into five main sections wherein we detail the frameworks of major literature in order to delve into the different types of text-to-image generation methods. Following this we provide a detailed comparison and critique of these methods and offer possible pathways of improvement for future work. In the future work, we argue that TTI development could yield impressive productivity improvements for creation, particularly in the context of the AIGC era, and could be extended to more complex tasks such as video generation and 3D generation.

READ FULL TEXT
research
03/14/2023

Text-to-image Diffusion Model in Generative AI: A Survey

This survey reviews text-to-image diffusion models in the context that d...
research
06/22/2022

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

We present the Pathways Autoregressive Text-to-Image (Parti) model, whic...
research
11/07/2022

Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale

Machine learning models are now able to convert user-written text descri...
research
03/26/2020

StrokeCoder: Path-Based Image Generation from Single Examples using Transformers

This paper demonstrates how a Transformer Neural Network can be used to ...
research
05/12/2023

Better speech synthesis through scaling

In recent years, the field of image generation has been revolutionized b...
research
06/29/2023

CLIPAG: Towards Generator-Free Text-to-Image Generation

Perceptually Aligned Gradients (PAG) refer to an intriguing property obs...
research
07/29/2022

Testing Relational Understanding in Text-Guided Image Generation

Relations are basic building blocks of human cognition. Classic and rece...

Please sign up or login with your details

Forgot password? Click here to reset