Towards Equitable Representation in Text-to-Image Synthesis Models with the Cross-Cultural Understanding Benchmark (CCUB) Dataset

01/28/2023
by   Zhixuan Liu, et al.
0

It has been shown that accurate representation in media improves the well-being of the people who consume it. By contrast, inaccurate representations can negatively affect viewers and lead to harmful perceptions of other cultures. To achieve inclusive representation in generated images, we propose a culturally-aware priming approach for text-to-image synthesis using a small but culturally curated dataset that we collected, known here as Cross-Cultural Understanding Benchmark (CCUB) Dataset, to fight the bias prevalent in giant datasets. Our proposed approach is comprised of two fine-tuning techniques: (1) Adding visual context via fine-tuning a pre-trained text-to-image synthesis model, Stable Diffusion, on the CCUB text-image pairs, and (2) Adding semantic context via automated prompt engineering using the fine-tuned large language model, GPT-3, trained on our CCUB culturally-aware text data. CCUB dataset is curated and our approach is evaluated by people who have a personal relationship with that particular culture. Our experiments indicate that priming using both text and image is effective in improving the cultural relevance and decreasing the offensiveness of generated images while maintaining quality.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

page 10

page 11

research
07/06/2023

On the Cultural Gap in Text-to-Image Generation

One challenge in text-to-image (T2I) generation is the inadvertent refle...
research
10/05/2022

clip2latent: Text driven sampling of a pre-trained StyleGAN using denoising diffusion and CLIP

We introduce a new method to efficiently create text-to-image models fro...
research
04/14/2023

AutoSplice: A Text-prompt Manipulated Image Dataset for Media Forensics

Recent advancements in language-image models have led to the development...
research
11/28/2022

The Myth of Culturally Agnostic AI Models

The paper discusses the potential of large vision-language models as obj...
research
04/16/2021

Is Your Language Model Ready for Dense Representation Fine-tuning?

Pre-trained language models (LM) have become go-to text representation e...
research
02/23/2023

Aligning Text-to-Image Models using Human Feedback

Deep generative models have shown impressive results in text-to-image sy...
research
06/14/2023

Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis

Diffusion models (DMs) have recently gained attention with state-of-the-...

Please sign up or login with your details

Forgot password? Click here to reset