Better Aligning Text-to-Image Models with Human Preference

03/25/2023
by   Xiaoshi Wu, et al.
0

Recent years have witnessed a rapid growth of deep generative models, with text-to-image models gaining significant attention from the public. However, existing models often generate images that do not align well with human aesthetic preferences, such as awkward combinations of limbs and facial expressions. To address this issue, we collect a dataset of human choices on generated images from the Stable Foundation Discord channel. Our experiments demonstrate that current evaluation metrics for generative models do not correlate well with human choices. Thus, we train a human preference classifier with the collected dataset and derive a Human Preference Score (HPS) based on the classifier. Using the HPS, we propose a simple yet effective method to adapt Stable Diffusion to better align with human aesthetic preferences. Our experiments show that the HPS outperforms CLIP in predicting human choices and has good generalization capability towards images generated from other models. By tuning Stable Diffusion with the guidance of the HPS, the adapted model is able to generate images that are more preferred by human users.

READ FULL TEXT

page 1

page 3

page 7

page 8

page 12

page 13

page 14

page 15

research
06/15/2023

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Recent text-to-image generative models can generate high-fidelity images...
research
10/26/2022

DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models

With recent advancements in diffusion models, users can generate high-qu...
research
01/25/2023

Imitating Human Behaviour with Diffusion Models

Diffusion models have emerged as powerful generative models in the text-...
research
04/12/2023

ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation

We present ImageReward – the first general-purpose text-to-image human p...
research
02/23/2023

Aligning Text-to-Image Models using Human Feedback

Deep generative models have shown impressive results in text-to-image sy...
research
05/24/2023

Analyzing Influential Factors in Human Preference Judgments via GPT-4

Pairwise human judgments are pivotal in guiding large language models (L...
research
03/29/2023

HoloDiffusion: Training a 3D Diffusion Model using 2D Images

Diffusion models have emerged as the best approach for generative modeli...

Please sign up or login with your details

Forgot password? Click here to reset