Controlling Text-to-Image Diffusion by Orthogonal Finetuning

06/12/2023
by   Zeju Qiu, et al.
0

Large text-to-image diffusion models have impressive capabilities in generating photorealistic images from text prompts. How to effectively guide or control these powerful models to perform different downstream tasks becomes an important open problem. To tackle this challenge, we introduce a principled finetuning method – Orthogonal Finetuning (OFT), for adapting text-to-image diffusion models to downstream tasks. Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere. We find that this property is crucial for preserving the semantic generation ability of text-to-image diffusion models. To improve finetuning stability, we further propose Constrained Orthogonal Finetuning (COFT) which imposes an additional radius constraint to the hypersphere. Specifically, we consider two important finetuning text-to-image tasks: subject-driven generation where the goal is to generate subject-specific images given a few images of a subject and a text prompt, and controllable generation where the goal is to enable the model to take in additional control signals. We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.

READ FULL TEXT

page 2

page 4

page 6

page 7

page 8

page 19

page 39

page 40

research
04/20/2023

A data augmentation perspective on diffusion models and retrieval

Diffusion models excel at generating photorealistic images from text-que...
research
02/16/2023

T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models

The incredible generative ability of large-scale text-to-image (T2I) mod...
research
05/05/2023

DisenBooth: Disentangled Parameter-Efficient Tuning for Subject-Driven Text-to-Image Generation

Given a small set of images of a specific subject, subject-driven text-t...
research
08/25/2022

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

Large text-to-image models achieved a remarkable leap in the evolution o...
research
02/16/2023

Text-driven Visual Synthesis with Latent Diffusion Prior

There has been tremendous progress in large-scale text-to-image synthesi...
research
03/23/2023

DreamBooth3D: Subject-Driven Text-to-3D Generation

We present DreamBooth3D, an approach to personalize text-to-3D generativ...
research
05/22/2023

GSURE-Based Diffusion Model Training with Corrupted Data

Diffusion models have demonstrated impressive results in both data gener...

Please sign up or login with your details

Forgot password? Click here to reset