Dissecting Supervised Constrastive Learning

02/17/2021
by   Florian Graf, et al.
0

Minimizing cross-entropy over the softmax scores of a linear map composed with a high-capacity encoder is arguably the most popular choice for training neural networks on supervised learning tasks. However, recent works show that one can directly optimize the encoder instead, to obtain equally (or even more) discriminative representations via a supervised variant of a contrastive objective. In this work, we address the question whether there are fundamental differences in the sought-for representation geometry in the output space of the encoder at minimal loss. Specifically, we prove, under mild assumptions, that both losses attain their minimum once the representations of each class collapse to the vertices of a regular simplex, inscribed in a hypersphere. We provide empirical evidence that this configuration is attained in practice and that reaching a close-to-optimal state typically indicates good generalization performance. Yet, the two losses show remarkably different optimization behavior. The number of iterations required to perfectly fit to data scales superlinearly with the amount of randomly flipped labels for the supervised contrastive loss. This is in contrast to the approximately linear scaling previously reported for networks trained with cross-entropy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2021

A Broad Study on the Transferability of Visual Representations with Contrastive Learning

Tremendous progress has been made in visual representation learning, not...
research
01/08/2023

Learning the Relation between Similarity Loss and Clustering Loss in Self-Supervised Learning

Self-supervised learning enables networks to learn discriminative featur...
research
10/13/2020

Contrast and Classify: Alternate Training for Robust VQA

Recent Visual Question Answering (VQA) models have shown impressive perf...
research
05/19/2023

Towards understanding neural collapse in supervised contrastive learning with the information bottleneck method

Neural collapse describes the geometry of activation in the final layer ...
research
03/29/2021

von Mises-Fisher Loss: An Exploration of Embedding Geometries for Supervised Learning

Recent work has argued that classification losses utilizing softmax cros...
research
11/05/2020

Intriguing Properties of Contrastive Losses

Contrastive loss and its variants have become very popular recently for ...
research
04/21/2021

Sparse-Shot Learning for Extremely Many Localisations

Object localisation is typically considered in the context of regular im...

Please sign up or login with your details

Forgot password? Click here to reset