Survey: Transformer based Video-Language Pre-training

09/21/2021
by   Ludan Ruan, et al.
0

Inspired by the success of transformer-based pre-training methods on natural language tasks and further computer vision tasks, researchers have begun to apply transformer to video processing. This survey aims to give a comprehensive overview on transformer-based pre-training methods for Video-Language learning. We first briefly introduce the transformer tructure as the background knowledge, including attention mechanism, position encoding etc. We then describe the typical paradigm of pre-training fine-tuning on Video-Language processing in terms of proxy tasks, downstream tasks and commonly used video datasets. Next, we categorize transformer models into Single-Stream and Multi-Stream structures, highlight their innovations and compare their performances. Finally, we analyze and discuss the current challenges and possible future research directions for Video-Language pre-training.

READ FULL TEXT

page 2

page 8

research
02/18/2022

A Survey of Vision-Language Pre-Trained Models

As Transformer evolved, pre-trained models have advanced at a breakneck ...
research
06/17/2021

Dual-view Molecule Pre-training

Inspired by its success in natural language processing and computer visi...
research
05/11/2023

Advancing Neural Encoding of Portuguese with Transformer Albertina PT-*

To advance the neural encoding of Portuguese (PT), and a fortiori the te...
research
05/02/2023

BrainNPT: Pre-training of Transformer networks for brain network classification

Deep learning methods have advanced quickly in brain imaging analysis ov...
research
05/26/2023

A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks

We study the phenomenon of in-context learning (ICL) exhibited by large ...
research
03/23/2022

Unsupervised Pre-Training on Patient Population Graphs for Patient-Level Predictions

Pre-training has shown success in different areas of machine learning, s...
research
02/18/2023

A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

The Pretrained Foundation Models (PFMs) are regarded as the foundation f...

Please sign up or login with your details

Forgot password? Click here to reset