Learning Factorized Multimodal Representations

06/16/2018
by   Yao-Hung Hubert Tsai, et al.
0

Learning representations of multimodal data is a fundamentally complex research problem due to the presence of multiple sources of information. To address the complexities of multimodal data, we argue that suitable representation learning models should: 1) factorize representations according to independent factors of variation in the data, capture important features for both 2) discriminative and 3) generative tasks, and 4) couple both modality-specific and multimodal information. To encapsulate all these properties, we propose the Multimodal Factorization Model (MFM) that factorizes multimodal representations into two sets of independent factors: multimodal discriminative factors and modality-specific generative factors. Multimodal discriminative factors are shared across all modalities and contain joint multimodal features required for discriminative tasks such as predicting sentiment. Modality-specific generative factors are unique for each modality and contain the information required for generating data. Our experimental results show that our model is able to learn meaningful multimodal representations and achieve state-of-the-art or competitive performance on five multimodal datasets. Our model also demonstrates flexible generative capabilities by conditioning on the independent factors. We further interpret our factorized representations to understand the interactions that influence multimodal learning.

READ FULL TEXT
research
06/04/2020

MHVAE: a Human-Inspired Deep Hierarchical Generative Model for Multimodal Representation Learning

Humans are able to create rich representations of their external reality...
research
05/14/2019

Strong and Simple Baselines for Multimodal Utterance Embeddings

Human language is a rich multimodal signal consisting of spoken words, f...
research
05/29/2018

Disentangling by Partitioning: A Representation Learning Framework for Multimodal Sensory Data

Multimodal sensory data resembles the form of information perceived by h...
research
03/03/2022

Graph Neural Networks for Multimodal Single-Cell Data Integration

Recent advances in multimodal single-cell technologies have enabled simu...
research
11/07/2022

Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments

A real-world application or setting involves interaction between differe...
research
03/26/2015

Generalized K-fan Multimodal Deep Model with Shared Representations

Multimodal learning with deep Boltzmann machines (DBMs) is an generative...
research
02/07/2022

GMC – Geometric Multimodal Contrastive Representation Learning

Learning representations of multimodal data that are both informative an...

Please sign up or login with your details

Forgot password? Click here to reset