Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information

11/17/2022
by   Weijie Su, et al.
0

To effectively exploit the potential of large-scale models, various pre-training strategies supported by massive data from different sources are proposed, including supervised pre-training, weakly-supervised pre-training, and self-supervised pre-training. It has been proved that combining multiple pre-training strategies and data from various modalities/sources can greatly boost the training of large-scale models. However, current works adopt a multi-stage pre-training system, where the complex pipeline may increase the uncertainty and instability of the pre-training. It is thus desirable that these strategies can be integrated in a single-stage manner. In this paper, we first propose a general multi-modal mutual information formula as a unified optimization target and demonstrate that all existing approaches are special cases of our framework. Under this unified perspective, we propose an all-in-one single-stage pre-training approach, named Maximizing Multi-modal Mutual Information Pre-training (M3I Pre-training). Our approach achieves better performance than previous pre-training methods on various vision benchmarks, including ImageNet classification, COCO object detection, LVIS long-tailed object detection, and ADE20k semantic segmentation. Notably, we successfully pre-train a billion-level parameter image backbone and achieve state-of-the-art performance on various benchmarks. Code shall be released at https://github.com/OpenGVLab/M3I-Pretraining.

READ FULL TEXT
research
04/27/2023

Retrieval-based Knowledge Augmented Vision Language Pre-training

With recent progress in large-scale vision and language representation l...
research
07/19/2022

Self-Supervision Can Be a Good Few-Shot Learner

Existing few-shot learning (FSL) methods rely on training with a large l...
research
04/25/2020

Cheaper Pre-training Lunch: An Efficient Paradigm for Object Detection

In this paper, we propose a general and efficient pre-training paradigm,...
research
06/11/2022

A Unified Continuous Learning Framework for Multi-modal Knowledge Discovery and Pre-training

Multi-modal pre-training and knowledge discovery are two important resea...
research
12/01/2022

FakeOut: Leveraging Out-of-domain Self-supervision for Multi-modal Video Deepfake Detection

Video synthesis methods rapidly improved in recent years, allowing easy ...
research
08/03/2022

Learning Prior Feature and Attention Enhanced Image Inpainting

Many recent inpainting works have achieved impressive results by leverag...
research
12/22/2022

Reversible Column Networks

We propose a new neural network design paradigm Reversible Column Networ...

Please sign up or login with your details

Forgot password? Click here to reset