Understanding the Behaviour of Contrastive Loss

by   Feng Wang, et al.

Unsupervised contrastive learning has achieved outstanding success, while the mechanism of contrastive loss has been less studied. In this paper, we concentrate on the understanding of the behaviours of unsupervised contrastive loss. We will show that the contrastive loss is a hardness-aware loss function, and the temperature τ controls the strength of penalties on hard negative samples. The previous study has shown that uniformity is a key property of contrastive learning. We build relations between the uniformity and the temperature τ. We will show that uniformity helps the contrastive learning to learn separable features, however excessive pursuit to the uniformity makes the contrastive loss not tolerant to semantically similar samples, which may break the underlying semantic structure and be harmful to the formation of features useful for downstream tasks. This is caused by the inherent defect of the instance discrimination objective. Specifically, instance discrimination objective tries to push all different instances apart, ignoring the underlying relations between samples. Pushing semantically consistent samples apart has no positive effect for acquiring a prior informative to general downstream tasks. A well-designed contrastive loss should have some extents of tolerance to the closeness of semantically similar samples. Therefore, we find that the contrastive loss meets a uniformity-tolerance dilemma, and a good choice of temperature can compromise these two properties properly to both learn separable features and tolerant to semantically similar samples, improving the feature qualities and the downstream performances.


page 1

page 2

page 3

page 4


Model-Aware Contrastive Learning: Towards Escaping Uniformity-Tolerance Dilemma in Training

Instance discrimination contrastive learning (CL) has achieved significa...

Sharp Learning Bounds for Contrastive Unsupervised Representation Learning

Contrastive unsupervised representation learning (CURL) encourages data ...

Are all negatives created equal in contrastive instance discrimination?

Self-supervised learning has recently begun to rival supervised learning...

Siamese Prototypical Contrastive Learning

Contrastive Self-supervised Learning (CSL) is a practical solution that ...

Hierarchical Semantic Aggregation for Contrastive Representation Learning

Self-supervised learning based on instance discrimination has shown rema...

Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo

Contrastive learning (CL) is widely known to require many negative sampl...

Temperature as Uncertainty in Contrastive Learning

Contrastive learning has demonstrated great capability to learn represen...