Disentangling by Partitioning: A Representation Learning Framework for Multimodal Sensory Data

05/29/2018
by   Wei-Ning Hsu, et al.
0

Multimodal sensory data resembles the form of information perceived by humans for learning, and are easy to obtain in large quantities. Compared to unimodal data, synchronization of concepts between modalities in such data provides supervision for disentangling the underlying explanatory factors of each modality. Previous work leveraging multimodal data has mainly focused on retaining only the modality-invariant factors while discarding the rest. In this paper, we present a partitioned variational autoencoder (PVAE) and several training objectives to learn disentangled representations, which encode not only the shared factors, but also modality-dependent ones, into separate latent variables. Specifically, PVAE integrates a variational inference framework and a multimodal generative model that partitions the explanatory factors and conditions only on the relevant subset of them for generation. We evaluate our model on two parallel speech/image datasets, and demonstrate its ability to learn disentangled representations by qualitatively exploring within-modality and cross-modality conditional generation with semantics and styles specified by examples. For quantitative analysis, we evaluate the classification accuracy of automatically discovered semantic units. Our PVAE can achieve over 99 accuracy on both modalities.

READ FULL TEXT

page 8

page 15

page 16

page 17

page 19

page 20

page 21

page 22

research
06/04/2020

MHVAE: a Human-Inspired Deep Hierarchical Generative Model for Multimodal Representation Learning

Humans are able to create rich representations of their external reality...
research
06/16/2018

Learning Factorized Multimodal Representations

Learning representations of multimodal data is a fundamentally complex r...
research
08/26/2020

Training Multimodal Systems for Classification with Multiple Objectives

We learn about the world from a diverse range of sensory information. Au...
research
12/23/2020

Private-Shared Disentangled Multimodal VAE for Learning of Hybrid Latent Representations

Multi-modal generative models represent an important family of deep mode...
research
05/05/2023

A Multimodal Dynamical Variational Autoencoder for Audiovisual Speech Representation Learning

In this paper, we present a multimodal and dynamical VAE (MDVAE) applied...
research
10/07/2021

How to Sense the World: Leveraging Hierarchy in Multimodal Perception for Robust Reinforcement Learning Agents

This work addresses the problem of sensing the world: how to learn a mul...
research
11/07/2022

Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments

A real-world application or setting involves interaction between differe...

Please sign up or login with your details

Forgot password? Click here to reset