Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

04/04/2023
by   Alberto Baldrati, et al.
0

Fashion illustration is used by designers to communicate their vision and to bring the design idea from conceptualization to realization, showing how clothes interact with the human body. In this context, computer vision can thus be used to improve the fashion design process. Differently from previous works that mainly focused on the virtual try-on of garments, we propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images by following multimodal prompts, such as text, human body poses, and garment sketches. We tackle this problem by proposing a new architecture based on latent diffusion models, an approach that has not been used before in the fashion domain. Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets, namely Dress Code and VITON-HD, with multimodal annotations collected in a semi-automatic manner. Experimental results on these new datasets demonstrate the effectiveness of our proposal, both in terms of realism and coherence with the given multimodal inputs. Source code and collected multimodal annotations will be publicly released at: https://github.com/aimagelab/multimodal-garment-designer.

READ FULL TEXT

page 3

page 5

page 7

page 11

page 12

page 13

page 14

page 15

research
05/24/2023

BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing

Subject-driven text-to-image generation models create novel renditions o...
research
04/05/2023

Personality-aware Human-centric Multimodal Reasoning: A New Task

Multimodal reasoning, an area of artificial intelligence that aims at ma...
research
04/16/2019

Fashion-AttGAN: Attribute-Aware Fashion Editing with Multi-Objective GAN

In this paper, we introduce attribute-aware fashion-editing, a novel tas...
research
05/29/2023

TD-GEM: Text-Driven Garment Editing Mapper

Language-based fashion image editing allows users to try out variations ...
research
06/02/2020

Situated and Interactive Multimodal Conversations

Next generation virtual assistants are envisioned to handle multimodal i...
research
09/11/2023

OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data

The inexorable growth of online shopping and e-commerce demands scalable...
research
05/22/2023

LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On

The rapidly evolving fields of e-commerce and metaverse continue to seek...

Please sign up or login with your details

Forgot password? Click here to reset