How do Cross-View and Cross-Modal Alignment Affect Representations in Contrastive Learning?

11/23/2022
by   Thomas M. Hehn, et al.
0

Various state-of-the-art self-supervised visual representation learning approaches take advantage of data from multiple sensors by aligning the feature representations across views and/or modalities. In this work, we investigate how aligning representations affects the visual features obtained from cross-view and cross-modal contrastive learning on images and point clouds. On five real-world datasets and on five tasks, we train and evaluate 108 models based on four pretraining variations. We find that cross-modal representation alignment discards complementary visual information, such as color and texture, and instead emphasizes redundant depth cues. The depth cues obtained from pretraining improve downstream depth prediction performance. Also overall, cross-modal alignment leads to more robust encoders than pre-training by cross-view alignment, especially on depth prediction, instance segmentation, and object detection.

READ FULL TEXT

page 1

page 6

page 7

page 8

research
02/21/2022

Vision-Language Pre-Training with Triple Contrastive Learning

Vision-language representation learning largely benefits from image-text...
research
09/30/2022

ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training

Recent Vision-Language Pre-trained (VLP) models based on dual encoder ha...
research
10/13/2022

X-Align: Cross-Modal Cross-View Alignment for Bird's-Eye-View Segmentation

Bird's-eye-view (BEV) grid is a common representation for the perception...
research
05/08/2023

Vision Langauge Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation

Cross-modal contrastive learning in vision language pretraining (VLP) fa...
research
09/08/2023

3D Denoisers are Good 2D Teachers: Molecular Pretraining via Denoising and Cross-Modal Distillation

Pretraining molecular representations from large unlabeled data is essen...
research
02/05/2023

Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

Mainstream 3D representation learning approaches are built upon contrast...
research
09/05/2022

Design of the topology for contrastive visual-textual alignment

Pre-training weakly related image-text pairs in the contrastive style sh...

Please sign up or login with your details

Forgot password? Click here to reset