AltDiffusion: A Multilingual Text-to-Image Diffusion Model

08/19/2023
by   Fulong Ye, et al.
0

Large Text-to-Image(T2I) diffusion models have shown a remarkable capability to produce photorealistic and diverse images based on text inputs. However, existing works only support limited language input, e.g., English, Chinese, and Japanese, leaving users beyond these languages underserved and blocking the global expansion of T2I models. Therefore, this paper presents AltDiffusion, a novel multilingual T2I diffusion model that supports eighteen different languages. Specifically, we first train a multilingual text encoder based on the knowledge distillation. Then we plug it into a pretrained English-only diffusion model and train the model with a two-stage schema to enhance the multilingual capability, including concept alignment and quality improvement stage on a large-scale multilingual dataset. Furthermore, we introduce a new benchmark, which includes Multilingual-General-18(MG-18) and Multilingual-Cultural-18(MC-18) datasets, to evaluate the capabilities of T2I diffusion models for generating high-quality images and capturing culture-specific concepts in different languages. Experimental results on both MG-18 and MC-18 demonstrate that AltDiffusion outperforms current state-of-the-art T2I models, e.g., Stable Diffusion in multilingual understanding, especially with respect to culture-specific concepts, while still having comparable capability for generating high-quality images.

READ FULL TEXT

page 2

page 6

page 7

page 11

page 12

page 13

page 14

page 15

research
05/19/2023

Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots

Diffusion models have made impressive progress in text-to-image synthesi...
research
10/07/2022

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

Multilingual text-video retrieval methods have improved significantly in...
research
09/30/2021

Multilingual AMR Parsing with Noisy Knowledge Distillation

We study multilingual AMR parsing from the perspective of knowledge dist...
research
09/04/2023

NLLB-CLIP – train performant multilingual image retrieval model on a budget

Today, the exponential rise of large models developed by academic and in...
research
07/23/2014

Joint Energy-based Detection and Classificationon of Multilingual Text Lines

This paper proposes a new hierarchical MDL-based model for a joint detec...
research
05/13/2022

Talking Face Generation with Multilingual TTS

In this work, we propose a joint system combining a talking face generat...
research
01/13/2023

In BLOOM: Creativity and Affinity in Artificial Lyrics and Art

We apply a large multilingual language model (BLOOM-176B) in open-ended ...

Please sign up or login with your details

Forgot password? Click here to reset