A survey of multimodal deep generative models

07/05/2022
by   Masahiro Suzuki, et al.
0

Multimodal learning is a framework for building models that make predictions based on different types of modalities. Important challenges in multimodal learning are the inference of shared representations from arbitrary modalities and cross-modal generation via these representations; however, achieving this requires taking the heterogeneous nature of multimodal data into account. In recent years, deep generative models, i.e., generative models in which distributions are parameterized by deep neural networks, have attracted much attention, especially variational autoencoders, which are suitable for accomplishing the above challenges because they can consider heterogeneity and infer good representations of data. Therefore, various multimodal generative models based on variational autoencoders, called multimodal deep generative models, have been proposed in recent years. In this paper, we provide a categorized survey of studies on multimodal deep generative models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/11/2019

Multimodal Generative Models for Compositional Representation Learning

As deep neural networks become more adept at traditional tasks, many of ...
research
11/08/2019

Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models

Learning generative models that span multiple data modalities, such as v...
research
07/02/2020

Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models

Multimodal learning for generative models often refers to the learning o...
research
06/21/2020

VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data

Deep generative models often perform poorly in real-world applications d...
research
10/27/2022

Deep Generative Models on 3D Representations: A Survey

Generative models, as an important family of statistical modeling, targe...
research
11/14/2020

Speech Prediction in Silent Videos using Variational Autoencoders

Understanding the relationship between the auditory and visual signals i...
research
09/07/2022

Benchmarking Multimodal Variational Autoencoders: GeBiD Dataset and Toolkit

Multimodal Variational Autoencoders (VAEs) have been a subject of intens...

Please sign up or login with your details

Forgot password? Click here to reset