TriCoLo: Trimodal Contrastive Loss for Fine-grained Text to Shape Retrieval

01/19/2022
by   Yue Ruan, et al.
0

Recent work on contrastive losses for learning joint embeddings over multimodal data has been successful at downstream tasks such as retrieval and classification. On the other hand, work on joint representation learning for 3D shapes and text has thus far mostly focused on improving embeddings through modeling of complex attention between representations , or multi-task learning . We show that with large batch contrastive learning we achieve SoTA on text-shape retrieval without complex attention mechanisms or losses. Prior work in 3D and text representations has also focused on bimodal representation learning using either voxels or multi-view images with text. To this end, we propose a trimodal learning scheme to achieve even higher performance and better representations for all modalities.

READ FULL TEXT

page 1

page 7

page 8

page 16

page 17

page 18

page 19

research
10/28/2022

Improving the Modality Representation with Multi-View Contrastive Learning for Multimodal Sentiment Analysis

Modality representation learning is an important problem for multimodal ...
research
07/17/2022

FashionViL: Fashion-Focused Vision-and-Language Representation Learning

Large-scale Vision-and-Language (V+L) pre-training for representation le...
research
03/16/2023

Identifiability Results for Multimodal Contrastive Learning

Contrastive learning is a cornerstone underlying recent progress in mult...
research
06/08/2021

Contrastive Representation Learning for Hand Shape Estimation

This work presents improvements in monocular hand shape estimation by bu...
research
11/14/2022

The Role of Local Alignment and Uniformity in Image-Text Contrastive Learning on Medical Images

Image-text contrastive learning has proven effective for pretraining med...
research
07/08/2021

Staying in Shape: Learning Invariant Shape Representations using Contrastive Learning

Creating representations of shapes that are invari-ant to isometric or a...
research
12/11/2022

Using Multiple Instance Learning to Build Multimodal Representations

Image-text multimodal representation learning aligns data across modalit...

Please sign up or login with your details

Forgot password? Click here to reset