M6-Fashion: High-Fidelity Multi-modal Image Generation and Editing

05/24/2022
by   Zhikang Li, et al.
0

The fashion industry has diverse applications in multi-modal image generation and editing. It aims to create a desired high-fidelity image with the multi-modal conditional signal as guidance. Most existing methods learn different condition guidance controls by introducing extra models or ignoring the style prior knowledge, which is difficult to handle multiple signal combinations and faces a low-fidelity problem. In this paper, we adapt both style prior knowledge and flexibility of multi-modal control into one unified two-stage framework, M6-Fashion, focusing on the practical AI-aided Fashion design. It decouples style codes in both spatial and semantic dimensions to guarantee high-fidelity image generation in the first stage. M6-Fashion utilizes self-correction for the non-autoregressive generation to improve inference speed, enhance holistic consistency, and support various signal controls. Extensive experiments on a large-scale clothing dataset M2C-Fashion demonstrate superior performances on various image generation and editing tasks. M6-Fashion model serves as a highly potential AI designer for the fashion industry.

READ FULL TEXT

page 1

page 5

page 6

page 7

page 8

page 10

page 11

research
05/29/2021

UFC-BERT: Unifying Multi-Modal Controls for Conditional Image Synthesis

Conditional image synthesis aims to create an image according to some mu...
research
04/25/2022

StyleGAN-Human: A Data-Centric Odyssey of Human Generation

Unconditional human image generation is an important task in vision and ...
research
05/04/2023

High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning

Recently, emotional talking face generation has received considerable at...
research
06/01/2023

Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation

Text-conditional diffusion models are able to generate high-fidelity ima...
research
04/20/2023

Collaborative Diffusion for Multi-Modal Face Generation and Editing

Diffusion models arise as a powerful generative tool recently. Despite t...
research
03/24/2022

Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Recent text-to-image generation methods provide a simple yet exciting co...
research
03/17/2023

MMFace4D: A Large-Scale Multi-Modal 4D Face Dataset for Audio-Driven 3D Face Animation

Audio-Driven Face Animation is an eagerly anticipated technique for appl...

Please sign up or login with your details

Forgot password? Click here to reset