Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation

10/19/2021
by   Bin Ren, et al.
9

It is hard to generate an image at target view well for previous cross-view image translation methods that directly adopt a simple encoder-decoder or U-Net structure, especially for drastically different views and severe deformation cases. To ease this problem, we propose a novel two-stage framework with a new Cascaded Cross MLP-Mixer (CrossMLP) sub-network in the first stage and one refined pixel-level loss in the second stage. In the first stage, the CrossMLP sub-network learns the latent transformation cues between image code and semantic map code via our novel CrossMLP blocks. Then the coarse results are generated progressively under the guidance of those cues. Moreover, in the second stage, we design a refined pixel-level loss that eases the noisy semantic label problem with more reasonable regularization in a more compact fashion for better optimization. Extensive experimental results on Dayton <cit.> and CVUSA <cit.> datasets show that our method can generate significantly better results than state-of-the-art methods. The source code and trained models are available at https://github.com/Amazingren/CrossMLP.

READ FULL TEXT

page 2

page 5

page 9

page 15

page 16

research
04/15/2019

Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation

Cross-view image translation is challenging because it involves images w...
research
02/03/2020

Multi-Channel Attention Selection GANs for Guided Image-to-Image Translation

We propose a novel model named Multi-Channel Attention Selection Generat...
research
04/12/2021

Cloth Interactive Transformer for Virtual Try-On

2D image-based virtual try-on has attracted increased attention from the...
research
03/22/2022

Cross-View Panorama Image Synthesis

In this paper, we tackle the problem of synthesizing a ground-view panor...
research
08/14/2023

Mutual Information-driven Triple Interaction Network for Efficient Image Dehazing

Multi-stage architectures have exhibited efficacy in image dehazing, whi...
research
10/23/2022

Towards Real-Time Text2Video via CLIP-Guided, Pixel-Level Optimization

We introduce an approach to generating videos based on a series of given...
research
09/24/2019

PolSAR Image Classification Based on Dilated Convolution and Pixel-Refining Parallel Mapping network in the Complex Domain

Efficient and accurate polarimetric synthetic aperture radar (PolSAR) im...

Please sign up or login with your details

Forgot password? Click here to reset