Multi-stage Pre-training over Simplified Multimodal Pre-training Models

07/22/2021
by   Tongtong Liu, et al.
0

Multimodal pre-training models, such as LXMERT, have achieved excellent results in downstream tasks. However, current pre-trained models require large amounts of training data and have huge model sizes, which make them difficult to apply in low-resource situations. How to obtain similar or even better performance than a larger model under the premise of less pre-training data and smaller model size has become an important problem. In this paper, we propose a new Multi-stage Pre-training (MSP) method, which uses information at different granularities from word, phrase to sentence in both texts and images to pre-train the model in stages. We also design several different pre-training tasks suitable for the information granularity in different stage in order to efficiently capture the diverse knowledge from a limited corpus. We take a Simplified LXMERT (LXMERT- S), which has only 45.9 LXMERT model and 11.76 MSP method. Experimental results show that our method achieves comparable performance to the original LXMERT model in all downstream tasks, and even outperforms the original model in Image-Text Retrieval task.

READ FULL TEXT
research
03/10/2022

Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability

Large-scale pre-training has been proven to be crucial for various compu...
research
07/26/2023

Pre-Training with Diffusion models for Dental Radiography segmentation

Medical radiography segmentation, and specifically dental radiography, i...
research
04/28/2022

Resource-efficient domain adaptive pre-training for medical images

The deep learning-based analysis of medical images suffers from data sca...
research
10/20/2022

Tele-Knowledge Pre-training for Fault Analysis

In this work, we share our experience on tele-knowledge pre-training for...
research
11/13/2022

Build generally reusable agent-environment interaction models

This paper tackles the problem of how to pre-train a model and make it g...
research
06/10/2016

Unsupervised Learning of Word-Sequence Representations from Scratch via Convolutional Tensor Decomposition

Unsupervised text embeddings extraction is crucial for text understandin...
research
03/15/2022

Modular and Parameter-Efficient Multimodal Fusion with Prompting

Recent research has made impressive progress in large-scale multimodal p...

Please sign up or login with your details

Forgot password? Click here to reset