Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots

05/19/2023
by   Jinyi Hu, et al.
0

Diffusion models have made impressive progress in text-to-image synthesis. However, training such large-scale models (e.g. Stable Diffusion), from scratch requires high computational costs and massive high-quality text-image pairs, which becomes unaffordable in other languages. To handle this challenge, we propose IAP, a simple but effective method to transfer English Stable Diffusion into Chinese. IAP optimizes only a separate Chinese text encoder with all other parameters fixed to align Chinese semantics space to the English one in CLIP. To achieve this, we innovatively treat images as pivots and minimize the distance of attentive features produced from cross-attention between images and each language respectively. In this way, IAP establishes connections of Chinese, English and visual semantics in CLIP's embedding space efficiently, advancing the quality of the generated image with direct Chinese prompts. Experimental results show that our method outperforms several strong Chinese diffusion models with only 5

READ FULL TEXT

page 4

page 10

page 11

research
08/19/2023

AltDiffusion: A Multilingual Text-to-Image Diffusion Model

Large Text-to-Image(T2I) diffusion models have shown a remarkable capabi...
research
09/11/2023

PAI-Diffusion: Constructing and Serving a Family of Open Chinese Diffusion Models for Text-to-image Synthesis on the Cloud

Text-to-image synthesis for the Chinese language poses unique challenges...
research
06/26/2018

Neural Cross-Lingual Coreference Resolution and its Application to Entity Linking

We propose an entity-centric neural cross-lingual coreference model that...
research
06/07/2021

Investigating Transfer Learning in Multilingual Pre-trained Language Models through Chinese Natural Language Inference

Multilingual transformers (XLM, mT5) have been shown to have remarkable ...
research
12/09/2022

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

Large-scale diffusion models have achieved state-of-the-art results on t...
research
08/06/2023

Towards Scene-Text to Scene-Text Translation

In this work, we study the task of “visually" translating scene text fro...
research
09/08/2023

MaskDiffusion: Boosting Text-to-Image Consistency with Conditional Mask

Recent advancements in diffusion models have showcased their impressive ...

Please sign up or login with your details

Forgot password? Click here to reset