CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

by   Andreas Fürst, et al.

Contrastive learning with the InfoNCE objective is exceptionally successful in various self-supervised learning tasks. Recently, the CLIP model yielded impressive results on zero-shot transfer learning when using InfoNCE for learning visual representations from natural language supervision. However, InfoNCE as a lower bound on the mutual information has been shown to perform poorly for high mutual information. In contrast, the InfoLOOB upper bound (leave one out bound) works well for high mutual information but suffers from large variance and instabilities. We introduce "Contrastive Leave One Out Boost" (CLOOB), where modern Hopfield networks boost learning with the InfoLOOB objective. Modern Hopfield networks replace the original embeddings by retrieved embeddings in the InfoLOOB objective. The retrieved embeddings give InfoLOOB two assets. Firstly, the retrieved embeddings stabilize InfoLOOB, since they are less noisy and more similar to one another than the original embeddings. Secondly, they are enriched by correlations, since the covariance structure of embeddings is reinforced through retrievals. We compare CLOOB to CLIP after learning on the Conceptual Captions and the YFCC dataset with respect to their zero-shot transfer learning performance on other datasets. CLOOB consistently outperforms CLIP at zero-shot transfer learning across all considered architectures and datasets.



There are no comments yet.


page 36


Self-Distilled Self-Supervised Representation Learning

State-of-the-art frameworks in self-supervised learning have recently sh...

On Mutual Information in Contrastive Learning for Visual Representations

In recent years, several unsupervised, "contrastive" learning algorithms...

CLIP-Lite: Information Efficient Visual Representation Learning from Textual Annotations

We propose CLIP-Lite, an information efficient method for visual represe...

On the Transferability of VAE Embeddings using Relational Knowledge with Semi-Supervision

We propose a new model for relational VAE semi-supervision capable of ba...

AutoML using Metadata Language Embeddings

As a human choosing a supervised learning algorithm, it is natural to be...

CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information

Mutual information (MI) minimization has gained considerable interests i...

Self-Contrastive Learning

This paper proposes a novel contrastive learning framework, coined as Se...

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.