GMC – Geometric Multimodal Contrastive Representation Learning

02/07/2022
by   Petra Poklukar, et al.
0

Learning representations of multimodal data that are both informative and robust to missing modalities at test time remains a challenging problem due to the inherent heterogeneity of data obtained from different channels. To address it, we present a novel Geometric Multimodal Contrastive (GMC) representation learning method comprised of two main components: i) a two-level architecture consisting of modality-specific base encoder, allowing to process an arbitrary number of modalities to an intermediate representation of fixed dimensionality, and a shared projection head, mapping the intermediate representations to a latent representation space; ii) a multimodal contrastive loss function that encourages the geometric alignment of the learned representations. We experimentally demonstrate that GMC representations are semantically rich and achieve state-of-the-art performance with missing modality information on three different learning problems including prediction and reinforcement learning tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/26/2022

Multimodal Contrastive Learning via Uni-Modal Coding and Cross-Modal Prediction for Multimodal Sentiment Analysis

Multimodal representation learning is a challenging task in which previo...
research
06/04/2020

MHVAE: a Human-Inspired Deep Hierarchical Generative Model for Multimodal Representation Learning

Humans are able to create rich representations of their external reality...
research
06/16/2018

Learning Factorized Multimodal Representations

Learning representations of multimodal data is a fundamentally complex r...
research
10/07/2021

How to Sense the World: Leveraging Hierarchy in Multimodal Perception for Robust Reinforcement Learning Agents

This work addresses the problem of sensing the world: how to learn a mul...
research
03/01/2023

Can representation learning for multimodal image registration be improved by supervision of intermediate layers?

Multimodal imaging and correlative analysis typically require image alig...
research
11/07/2022

Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments

A real-world application or setting involves interaction between differe...
research
11/21/2018

Learning from Multiview Correlations in Open-Domain Videos

An increasing number of datasets contain multiple views, such as video, ...

Please sign up or login with your details

Forgot password? Click here to reset