Investigating the Role of Negatives in Contrastive Representation Learning

by   Jordan T. Ash, et al.

Noise contrastive learning is a popular technique for unsupervised representation learning. In this approach, a representation is obtained via reduction to supervised learning, where given a notion of semantic similarity, the learner tries to distinguish a similar (positive) example from a collection of random (negative) examples. The success of modern contrastive learning pipelines relies on many parameters such as the choice of data augmentation, the number of negative examples, and the batch size; however, there is limited understanding as to how these parameters interact and affect downstream performance. We focus on disambiguating the role of one of these parameters: the number of negative examples. Theoretically, we show the existence of a collision-coverage trade-off suggesting that the optimal number of negative examples should scale with the number of underlying concepts in the data. Empirically, we scrutinize the role of the number of negatives in both NLP and vision tasks. In the NLP task, we find that the results broadly agree with our theory, while our vision experiments are murkier with performance sometimes even being insensitive to the number of negatives. We discuss plausible explanations for this behavior and suggest future directions to better align theory and practice.


page 9

page 23

page 24

page 25


Do More Negative Samples Necessarily Hurt in Contrastive Learning?

Recent investigations in noise contrastive estimation suggest, both empi...

Sharp Learning Bounds for Contrastive Unsupervised Representation Learning

Contrastive unsupervised representation learning (CURL) encourages data ...

Debiased Contrastive Learning

A prominent technique for self-supervised representation learning has be...

Hard Negative Sampling Strategies for Contrastive Representation Learning

One of the challenges in contrastive learning is the selection of approp...

Scaling Deep Contrastive Learning Batch Size with Almost Constant Peak Memory Usage

Contrastive learning has been applied successfully to learn numerical ve...

Understanding Hard Negatives in Noise Contrastive Estimation

The choice of negative examples is important in noise contrastive estima...

DSReg: Using Distant Supervision as a Regularizer

In this paper, we aim at tackling a general issue in NLP tasks where som...