MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation

05/24/2023
by   Marco Bellagente, et al.
24

The recent popularity of text-to-image diffusion models (DM) can largely be attributed to the intuitive interface they provide to users. The intended generation can be expressed in natural language, with the model producing faithful interpretations of text prompts. However, expressing complex or nuanced ideas in text alone can be difficult. To ease image generation, we propose MultiFusion that allows one to express complex and nuanced concepts with arbitrarily interleaved inputs of multiple modalities and languages. MutliFusion leverages pre-trained models and aligns them for integration into a cohesive system, thereby avoiding the need for extensive training from scratch. Our experimental results demonstrate the efficient transfer of capabilities from individual modules to the downstream model. Specifically, the fusion of all independent components allows the image generation module to utilize multilingual, interleaved multimodal inputs despite being trained solely on monomodal data in a single language.

READ FULL TEXT

page 5

page 6

page 8

page 9

page 14

page 16

page 17

page 18

research
11/27/2021

LAFITE: Towards Language-Free Training for Text-to-Image Generation

One of the major challenges in training text-to-image generation models ...
research
05/26/2023

Generating Images with Multimodal Language Models

We propose a method to fuse frozen text-only large language models (LLMs...
research
05/09/2023

SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models

Diffusion models, which have emerged to become popular text-to-image gen...
research
12/01/2022

Weakly Supervised Annotations for Multi-modal Greeting Cards Dataset

In recent years, there is a growing number of pre-trained models trained...
research
08/31/2023

Enhancing Subtask Performance of Multi-modal Large Language Model

Multi-modal Large Language Model (MLLM) refers to a model expanded from ...
research
09/18/2023

Progressive Text-to-Image Diffusion with Soft Latent Direction

In spite of the rapidly evolving landscape of text-to-image generation, ...

Please sign up or login with your details

Forgot password? Click here to reset