On the Limitations of Multimodal VAEs

10/08/2021
by   Imant Daunhawer, et al.
0

Multimodal variational autoencoders (VAEs) have shown promise as efficient generative models for weakly-supervised data. Yet, despite their advantage of weak supervision, they exhibit a gap in generative quality compared to unimodal VAEs, which are completely unsupervised. In an attempt to explain this gap, we uncover a fundamental limitation that applies to a large family of mixture-based multimodal VAEs. We prove that the sub-sampling of modalities enforces an undesirable upper bound on the multimodal ELBO and thereby limits the generative quality of the respective models. Empirically, we showcase the generative quality gap on both synthetic and real data and present the tradeoffs between different variants of multimodal VAEs. We find that none of the existing approaches fulfills all desired criteria of an effective multimodal generative model when applied on more complex datasets than those used in previous benchmarks. In summary, we identify, formalize, and validate fundamental limitations of VAE-based approaches for modeling weakly-supervised data and discuss implications for real-world applications.

READ FULL TEXT

page 6

page 8

page 24

page 25

research
07/02/2020

Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models

Multimodal learning for generative models often refers to the learning o...
research
02/14/2018

Multimodal Generative Models for Scalable Weakly-Supervised Learning

Multiple modalities often co-occur when describing natural phenomena. Le...
research
05/25/2023

Score-Based Multimodal Autoencoders

Multimodal Variational Autoencoders (VAEs) represent a promising group o...
research
12/11/2019

Multimodal Generative Models for Compositional Representation Learning

As deep neural networks become more adept at traditional tasks, many of ...
research
06/09/2019

Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction

Future prediction is a fundamental principle of intelligence that helps ...
research
09/07/2022

Benchmarking Multimodal Variational Autoencoders: GeBiD Dataset and Toolkit

Multimodal Variational Autoencoders (VAEs) have been a subject of intens...
research
05/06/2021

Generalized Multimodal ELBO

Multiple data types naturally co-occur when describing real-world phenom...

Please sign up or login with your details

Forgot password? Click here to reset