Understanding Multimodal Contrastive Learning and Incorporating Unpaired Data

02/13/2023
by   Ryumei Nakada, et al.
0

Language-supervised vision models have recently attracted great attention in computer vision. A common approach to build such models is to use contrastive learning on paired data across the two modalities, as exemplified by Contrastive Language-Image Pre-Training (CLIP). In this paper, under linear representation settings, (i) we initiate the investigation of a general class of nonlinear loss functions for multimodal contrastive learning (MMCL) including CLIP loss and show its connection to singular value decomposition (SVD). Namely, we show that each step of loss minimization by gradient descent can be seen as performing SVD on a contrastive cross-covariance matrix. Based on this insight, (ii) we analyze the performance of MMCL. We quantitatively show that the feature learning ability of MMCL can be better than that of unimodal contrastive learning applied to each modality even under the presence of wrongly matched pairs. This characterizes the robustness of MMCL to noisy data. Furthermore, when we have access to additional unpaired data, (iii) we propose a new MMCL loss that incorporates additional unpaired datasets. We show that the algorithm can detect the ground-truth pairs and improve performance by fully exploiting unpaired datasets. The performance of the proposed algorithm was verified by numerical experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2021

A Framework using Contrastive Learning for Classification with Noisy Labels

We propose a framework using contrastive learning as a pre-training task...
research
09/27/2022

UniCLIP: Unified Framework for Contrastive Language-Image Pre-training

Pre-training vision-language models with contrastive objectives has show...
research
08/24/2023

Exploring Transferability of Multimodal Adversarial Samples for Vision-Language Pre-training Models with Contrastive Learning

Vision-language pre-training models (VLP) are vulnerable, especially to ...
research
03/16/2023

Identifiability Results for Multimodal Contrastive Learning

Contrastive learning is a cornerstone underlying recent progress in mult...
research
06/06/2022

Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts

Large sparsely-activated models have obtained excellent performance in m...
research
04/07/2023

On the Importance of Contrastive Loss in Multimodal Learning

Recently, contrastive learning approaches (e.g., CLIP (Radford et al., 2...
research
11/03/2022

Scaling Multimodal Pre-Training via Cross-Modality Gradient Harmonization

Self-supervised pre-training recently demonstrates success on large-scal...

Please sign up or login with your details

Forgot password? Click here to reset