Vision-and-Language Pretraining

07/05/2022
by   Thong Nguyen, et al.
0

With the burgeoning amount of data of image-text pairs and diversity of Vision-and-Language (V L) tasks, scholars have introduced an abundance of deep learning models in this research domain. Furthermore, in recent years, transfer learning has also shown tremendous success in Computer Vision for tasks such as Image Classification, Object Detection, etc., and in Natural Language Processing for Question Answering, Machine Translation, etc. Inheriting the spirit of Transfer Learning, research works in V L have devised multiple pretraining techniques on large-scale datasets in order to enhance the performance of downstream tasks. The aim of this article is to provide a comprehensive revision of contemporary V L pretraining models. In particular, we categorize and delineate pretraining approaches, along with the summary of state-of-the-art vision-and-language pre-trained models. Moreover, a list of training datasets and downstream tasks is supplied to further polish the perspective on V L pretraining. Lastly, we decided to take a further step to discuss numerous directions for future research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2022

A Survey of Vision-Language Pre-Trained Models

As Transformer evolved, pre-trained models have advanced at a breakneck ...
research
04/19/2020

Are we pretraining it right? Digging deeper into visio-linguistic pretraining

Numerous recent works have proposed pretraining generic visio-linguistic...
research
03/30/2023

Whether and When does Endoscopy Domain Pretraining Make Sense?

Automated endoscopy video analysis is a challenging task in medical comp...
research
11/08/2022

Reducing Down(stream)time: Pretraining Molecular GNNs using Heterogeneous AI Accelerators

The demonstrated success of transfer learning has popularized approaches...
research
05/24/2023

UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning

Charts are very popular for analyzing data, visualizing key insights and...
research
01/26/2022

Learning to Compose Diversified Prompts for Image Emotion Classification

Contrastive Language-Image Pre-training (CLIP) represents the latest inc...
research
02/18/2023

A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

The Pretrained Foundation Models (PFMs) are regarded as the foundation f...

Please sign up or login with your details

Forgot password? Click here to reset