Mitigating Modality Collapse in Multimodal VAEs via Impartial Optimization

06/09/2022
by   Adrián Javaloy, et al.
29

A number of variational autoencoders (VAEs) have recently emerged with the aim of modeling multimodal data, e.g., to jointly model images and their corresponding captions. Still, multimodal VAEs tend to focus solely on a subset of the modalities, e.g., by fitting the image while neglecting the caption. We refer to this limitation as modality collapse. In this work, we argue that this effect is a consequence of conflicting gradients during multimodal VAE training. We show how to detect the sub-graphs in the computational graphs where gradients conflict (impartiality blocks), as well as how to leverage existing gradient-conflict solutions from multitask learning to mitigate modality collapse. That is, to ensure impartial optimization across modalities. We apply our training framework to several multimodal VAE models, losses and datasets from the literature, and empirically show that our framework significantly improves the reconstruction performance, conditional generation, and coherence of the latent space across modalities.

READ FULL TEXT
research
05/25/2023

Score-Based Multimodal Autoencoders

Multimodal Variational Autoencoders (VAEs) represent a promising group o...
research
04/11/2022

Mixture-of-experts VAEs can disregard variation in surjective multimodal data

Machine learning systems are often deployed in domains that entail data ...
research
05/19/2023

Improving Multimodal Joint Variational Autoencoders through Normalizing Flows and Correlation Analysis

We propose a new multimodal variational autoencoder that enables to gene...
research
01/26/2018

Improving Bi-directional Generation between Different Modalities with Variational Autoencoders

We investigate deep generative models that can exchange multiple modalit...
research
10/22/2022

Greedy Modality Selection via Approximate Submodular Maximization

Multimodal learning considers learning from multi-modality data, aiming ...
research
07/27/2023

Cortex Inspired Learning to Recover Damaged Signal Modality with ReD-SOM Model

Recent progress in the fields of AI and cognitive sciences opens up new ...
research
06/11/2023

Multimodal Pathology Image Search Between H E Slides and Multiplexed Immunofluorescent Images

We present an approach for multimodal pathology image search, using dyna...

Please sign up or login with your details

Forgot password? Click here to reset