Consistent Multimodal Generation via A Unified GAN Framework

07/04/2023
by   Zhen Zhu, et al.
0

We investigate how to generate multimodal image outputs, such as RGB, depth, and surface normals, with a single generative model. The challenge is to produce outputs that are realistic, and also consistent with each other. Our solution builds on the StyleGAN3 architecture, with a shared backbone and modality-specific branches in the last layers of the synthesis network, and we propose per-modality fidelity discriminators and a cross-modality consistency discriminator. In experiments on the Stanford2D3D dataset, we demonstrate realistic and consistent generation of RGB, depth, and normal images. We also show a training recipe to easily extend our pretrained model on a new domain, even with a few pairwise data. We further evaluate the use of synthetically generated RGB and depth pairs for training or fine-tuning depth estimators. Code will be available at https://github.com/jessemelpolio/MultimodalGAN.

READ FULL TEXT

page 1

page 6

page 7

page 12

research
07/03/2022

You Only Need One Detector: Unified Object Detector for Different Modalities based on Vision Transformers

Most systems use different models for different modalities, such as one ...
research
01/21/2018

Depth CNNs for RGB-D scene recognition: learning from scratch better than transferring from RGB-CNNs

Scene recognition with RGB images has been extensively studied and has r...
research
06/22/2021

Fine-Tuning StyleGAN2 For Cartoon Face Generation

Recent studies have shown remarkable success in the unsupervised image t...
research
04/06/2021

Teacher-Student Adversarial Depth Hallucination to Improve Face Recognition

We present the Teacher-Student Generative Adversarial Network (TS-GAN) t...
research
01/22/2020

ManyModalQA: Modality Disambiguation and QA over Diverse Inputs

We present a new multimodal question answering challenge, ManyModalQA, i...
research
07/13/2020

Low to High Dimensional Modality Hallucination using Aggregated Fields of View

Real-world robotics systems deal with data from a multitude of modalitie...
research
02/14/2022

A Survey of Cross-Modality Brain Image Synthesis

The existence of completely aligned and paired multi-modal neuroimaging ...

Please sign up or login with your details

Forgot password? Click here to reset