T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models

02/16/2023
∙
by   Chong Mou, et al.
∙
0
∙

The incredible generative ability of large-scale text-to-image (T2I) models has demonstrated strong power of learning complex structures and meaningful semantics. However, relying solely on text prompts cannot fully take advantage of the knowledge learned by the model, especially when flexible and accurate controlling (e.g., color and structure) is needed. In this paper, we aim to “dig out" the capabilities that T2I models have implicitly learned, and then explicitly use them to control the generation more granularly. Specifically, we propose to learn simple and lightweight T2I-Adapters to align internal knowledge in T2I models with external control signals, while freezing the original large T2I models. In this way, we can train various adapters according to different conditions, achieving rich control and editing effects in the color and structure of the generation results. Further, the proposed T2I-Adapters have attractive properties of practical value, such as composability and generalization ability. Extensive experiments demonstrate that our T2I-Adapter has promising generation quality and a wide range of applications.

READ FULL TEXT

page 1

page 7

page 8

page 9

page 10

page 11

page 12

page 15

research
∙ 06/12/2023

Controlling Text-to-Image Diffusion by Orthogonal Finetuning

Large text-to-image diffusion models have impressive capabilities in gen...
research
∙ 02/20/2023

Composer: Creative and Controllable Image Synthesis with Composable Conditions

Recent large-scale generative models learned on big data are capable of ...
research
∙ 05/28/2023

Mitigating Inappropriateness in Image Generation: Can there be Value in Reflecting the World's Ugliness?

Text-conditioned image generation models have recently achieved astonish...
research
∙ 10/02/2020

MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models

Existing pre-trained large language models have shown unparalleled gener...
research
∙ 03/08/2023

Video-P2P: Video Editing with Cross-attention Control

This paper presents Video-P2P, a novel framework for real-world video ed...
research
∙ 05/24/2023

DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models

The recent progress in diffusion-based text-to-image generation models h...
research
∙ 08/23/2023

High-quality Image Dehazing with Diffusion Model

Image dehazing is quite challenging in dense-haze scenarios, where quite...

Please sign up or login with your details

Forgot password? Click here to reset