Vision + Language Applications: A Survey

05/24/2023
by   Yutong Zhou, et al.
0

Text-to-image generation has attracted significant interest from researchers and practitioners in recent years due to its widespread and diverse applications across various industries. Despite the progress made in the domain of vision and language research, the existing literature remains relatively limited, particularly with regard to advancements and applications in this field. This paper explores a relevant research track within multimodal applications, including text, vision, audio, and others. In addition to the studies discussed in this paper, we are also committed to continually updating the latest relevant papers, datasets, application projects and corresponding information at https://github.com/Yutong-Zhou-cv/Awesome-Text-to-Image

READ FULL TEXT

page 1

page 4

page 5

page 7

research
06/23/2023

A Survey on Multimodal Large Language Models

Multimodal Large Language Model (MLLM) recently has been a new rising re...
research
09/15/2022

LAVIS: A Library for Language-Vision Intelligence

We introduce LAVIS, an open-source deep learning library for LAnguage-VI...
research
10/25/2022

A Survey on 3D-aware Image Synthesis

Recent years have seen remarkable progress in deep learning powered visu...
research
05/12/2023

Better speech synthesis through scaling

In recent years, the field of image generation has been revolutionized b...
research
11/10/2018

Scene Text Detection and Recognition: The Deep Learning Era

With the rise and development of deep learning, computer vision has been...
research
06/29/2023

CLIPAG: Towards Generator-Free Text-to-Image Generation

Perceptually Aligned Gradients (PAG) refer to an intriguing property obs...
research
04/01/2023

From Zero to Hero: Convincing with Extremely Complicated Math

Becoming a (super) hero is almost every kid's dream. During their shelte...

Please sign up or login with your details

Forgot password? Click here to reset