Factorized Contrastive Learning: Going Beyond Multi-view Redundancy

06/08/2023
by   Paul Pu Liang, et al.
0

In a wide range of multimodal tasks, contrastive learning has become a particularly appealing approach since it can successfully learn representations from abundant unlabeled data with only pairing information (e.g., image-caption or video-audio pairs). Underpinning these approaches is the assumption of multi-view redundancy - that shared information between modalities is necessary and sufficient for downstream tasks. However, in many real-world settings, task-relevant information is also contained in modality-unique regions: information that is only present in one modality but still relevant to the task. How can we learn self-supervised multimodal representations to capture both shared and unique information relevant to downstream tasks? This paper proposes FactorCL, a new multimodal representation learning method to go beyond multi-view redundancy. FactorCL is built from three new contributions: (1) factorizing task-relevant information into shared and unique representations, (2) capturing task-relevant information via maximizing MI lower bounds and removing task-irrelevant information via minimizing MI upper bounds, and (3) multimodal data augmentations to approximate task relevance without labels. On large-scale real-world datasets, FactorCL captures both shared and unique information and achieves state-of-the-art results on six benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/14/2022

Rethinking Minimal Sufficient Representation in Contrastive Learning

Contrastive learning between different views of the data achieves outsta...
research
03/16/2023

Identifiability Results for Multimodal Contrastive Learning

Contrastive learning is a cornerstone underlying recent progress in mult...
research
08/26/2022

MORI-RAN: Multi-view Robust Representation Learning via Hybrid Contrastive Fusion

Multi-view representation learning is essential for many multi-view task...
research
01/23/2023

Zorro: the masked multimodal transformer

Attention-based models are appealing for multimodal processing because i...
research
06/07/2023

Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications

In many machine learning systems that jointly learn from multiple modali...
research
01/30/2022

Contrastive Learning from Demonstrations

This paper presents a framework for learning visual representations from...
research
07/06/2021

Contrastive Multimodal Fusion with TupleInfoNCE

This paper proposes a method for representation learning of multimodal d...

Please sign up or login with your details

Forgot password? Click here to reset