GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and Language Pre-training

08/08/2022
by   Jaeseok Byun, et al.
0

Most of the currently existing vision and language pre-training (VLP) methods have mainly focused on how to extract and align vision and text features. In contrast to the mainstream VLP methods, we highlight that two routinely applied steps during pre-training have crucial impact on the performance of the pre-trained model: in-batch hard negative sampling for image-text matching (ITM) and assigning the large masking probability for the masked language modeling (MLM). After empirically showing the unexpected effectiveness of above two steps, we systematically devise our GRIT-VLP, which adaptively samples mini-batches for more effective mining of hard negative samples for ITM while maintaining the computational cost for pre-training. Our method consists of three components: 1) GRouped mIni-baTch sampling (GRIT) strategy that collects similar examples in a mini-batch, 2) ITC consistency loss for improving the mining ability, and 3) enlarged masking probability for MLM. Consequently, we show our GRIT-VLP achieves a new state-of-the-art performance on various downstream tasks with much less computational cost. Furthermore, we demonstrate that our model is essentially in par with ALBEF, the previous state-of-the-art, only with one-third of training epochs on the same training data. Code is available at https://github.com/jaeseokbyun/GRIT-VLP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2022

Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models

Language modeling on large-scale datasets leads to impressive performanc...
research
02/28/2023

Global Proxy-based Hard Mining for Visual Place Recognition

Learning deep representations for visual place recognition is commonly p...
research
08/16/2023

ALIP: Adaptive Language-Image Pre-training with Synthetic Caption

Contrastive Language-Image Pre-training (CLIP) has significantly boosted...
research
02/08/2021

Improving memory banks for unsupervised learning with large mini-batch, consistency and hard negative mining

An important component of unsupervised learning by instance-based discri...
research
05/31/2023

Too Large; Data Reduction for Vision-Language Pre-Training

This paper examines the problems of severe image-text misalignment and h...
research
03/31/2023

Adaptive Sparse Pairwise Loss for Object Re-Identification

Object re-identification (ReID) aims to find instances with the same ide...
research
10/25/2021

Some like it tough: Improving model generalization via progressively increasing the training difficulty

In this work, we propose to progressively increase the training difficul...

Please sign up or login with your details

Forgot password? Click here to reset