BatchSampler: Sampling Mini-Batches for Contrastive Learning in Vision, Language, and Graphs

06/06/2023
by   Zhen Yang, et al.
0

In-Batch contrastive learning is a state-of-the-art self-supervised method that brings semantically-similar instances close while pushing dissimilar instances apart within a mini-batch. Its key to success is the negative sharing strategy, in which every instance serves as a negative for the others within the mini-batch. Recent studies aim to improve performance by sampling hard negatives within the current mini-batch, whose quality is bounded by the mini-batch itself. In this work, we propose to improve contrastive learning by sampling mini-batches from the input data. We present BatchSampler[The code is available at <https://github.com/THUDM/BatchSampler>] to sample mini-batches of hard-to-distinguish (i.e., hard and true negatives to each other) instances. To make each mini-batch have fewer false negatives, we design the proximity graph of randomly-selected instances. To form the mini-batch, we leverage random walk with restart on the proximity graph to help sample hard-to-distinguish instances. BatchSampler is a simple and general technique that can be directly plugged into existing contrastive learning models in vision, language, and graphs. Extensive experiments on datasets of three modalities show that BatchSampler can consistently improve the performance of powerful contrastive models, as shown by significant improvements of SimCLR on ImageNet-100, SimCSE on STS (language), and GraphCL and MVGRL on graph datasets.

READ FULL TEXT
research
07/12/2023

Mini-Batch Optimization of Contrastive Loss

Contrastive learning has gained significant attention as a method for se...
research
12/14/2019

Cross-Batch Memory for Embedding Learning

Mining informative negative instances are of central importance to deep ...
research
02/08/2021

Improving memory banks for unsupervised learning with large mini-batch, consistency and hard negative mining

An important component of unsupervised learning by instance-based discri...
research
01/17/2023

USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval

As a fundamental and challenging task in bridging language and vision do...
research
06/05/2023

LibAUC: A Deep Learning Library for X-Risk Optimization

This paper introduces the award-winning deep learning (DL) library calle...
research
09/29/2021

Contrastive Video-Language Segmentation

We focus on the problem of segmenting a certain object referred by a nat...
research
03/07/2020

Adaptive Offline Quintuplet Loss for Image-Text Matching

Existing image-text matching approaches typically leverage triplet loss ...

Please sign up or login with your details

Forgot password? Click here to reset