MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis

05/10/2023
by   Jianbin Zheng, et al.
0

Existing multimodal conditional image synthesis (MCIS) methods generate images conditioned on any combinations of various modalities that require all of them must be exactly conformed, hindering the synthesis controllability and leaving the potential of cross-modality under-exploited. To this end, we propose to generate images conditioned on the compositions of multimodal control signals, where modalities are imperfectly complementary, i.e., composed multimodal conditional image synthesis (CMCIS). Specifically, we observe two challenging issues of the proposed CMCIS task, i.e., the modality coordination problem and the modality imbalance problem. To tackle these issues, we introduce a Mixture-of-Modality-Tokens Transformer (MMoT) that adaptively fuses fine-grained multimodal control signals, a multimodal balanced training loss to stabilize the optimization of each modality, and a multimodal sampling guidance to balance the strength of each modality control signal. Comprehensive experimental results demonstrate that MMoT achieves superior performance on both unimodal conditional image synthesis (UCIS) and MCIS tasks with high-quality and faithful image synthesis on complex multimodal conditions. The project website is available at https://jabir-zheng.github.io/MMoT.

READ FULL TEXT

page 17

page 18

page 19

page 20

page 21

page 23

page 24

page 25

research
12/09/2021

Multimodal Conditional Image Synthesis with Product-of-Experts GANs

Existing conditional image synthesis frameworks generate images based on...
research
01/14/2021

A Unified Conditional Disentanglement Framework for Multimodal Brain MR Image Translation

Multimodal MRI provides complementary and clinically relevant informatio...
research
02/14/2023

Balanced Audiovisual Dataset for Imbalance Analysis

The imbalance problem is widespread in the field of machine learning, wh...
research
03/29/2022

Balanced Multimodal Learning via On-the-fly Gradient Modulation

Multimodal learning helps to comprehensively understand the world, by in...
research
04/17/2023

MMANet: Margin-aware Distillation and Modality-aware Regularization for Incomplete Multimodal Learning

Multimodal learning has shown great potentials in numerous scenes and at...
research
03/14/2021

Three Steps to Multimodal Trajectory Prediction: Modality Clustering, Classification and Synthesis

Multimodal prediction results are essential for trajectory forecasting t...
research
12/27/2021

Multimodal Image Synthesis and Editing: A Survey

As information exists in various modalities in real world, effective int...

Please sign up or login with your details

Forgot password? Click here to reset