On the Importance of Contrastive Loss in Multimodal Learning

04/07/2023
by   Yunwei Ren, et al.
0

Recently, contrastive learning approaches (e.g., CLIP (Radford et al., 2021)) have received huge success in multimodal learning, where the model tries to minimize the distance between the representations of different views (e.g., image and its caption) of the same data point while keeping the representations of different data points away from each other. However, from a theoretical perspective, it is unclear how contrastive learning can learn the representations from different views efficiently, especially when the data is not isotropic. In this work, we analyze the training dynamics of a simple multimodal contrastive learning model and show that contrastive pairs are important for the model to efficiently balance the learned representations. In particular, we show that the positive pairs will drive the model to align the representations at the cost of increasing the condition number, while the negative pairs will reduce the condition number, keeping the learned representations balanced.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/12/2022

Robust Contrastive Learning against Noisy Views

Contrastive learning relies on an assumption that positive pairs contain...
research
05/31/2022

Contrasting quadratic assignments for set-based representation learning

The standard approach to contrastive learning is to maximize the agreeme...
research
02/12/2021

Understanding self-supervised Learning Dynamics without Contrastive Pairs

Contrastive approaches to self-supervised learning (SSL) learn represent...
research
09/08/2021

Sequence Level Contrastive Learning for Text Summarization

Contrastive learning models have achieved great success in unsupervised ...
research
12/20/2021

Multimodal Adversarially Learned Inference with Factorized Discriminators

Learning from multimodal data is an important research topic in machine ...
research
04/14/2020

Contrastive Examples for Addressing the Tyranny of the Majority

Computer vision algorithms, e.g. for face recognition, favour groups of ...
research
02/13/2023

Understanding Multimodal Contrastive Learning and Incorporating Unpaired Data

Language-supervised vision models have recently attracted great attentio...

Please sign up or login with your details

Forgot password? Click here to reset