On the Cultural Gap in Text-to-Image Generation

07/06/2023
by   Bingshuai Liu, et al.
0

One challenge in text-to-image (T2I) generation is the inadvertent reflection of culture gaps present in the training data, which signifies the disparity in generated image quality when the cultural elements of the input text are rarely collected in the training set. Although various T2I models have shown impressive but arbitrary examples, there is no benchmark to systematically evaluate a T2I model's ability to generate cross-cultural images. To bridge the gap, we propose a Challenging Cross-Cultural (C3) benchmark with comprehensive evaluation criteria, which can assess how well-suited a model is to a target culture. By analyzing the flawed images generated by the Stable Diffusion model on the C3 benchmark, we find that the model often fails to generate certain cultural objects. Accordingly, we propose a novel multi-modal metric that considers object-text alignment to filter the fine-tuning data in the target culture, which is used to fine-tune a T2I model to improve cross-cultural generation. Experimental results show that our multi-modal metric provides stronger data selection performance on the C3 benchmark than existing metrics, in which the object-text alignment is crucial. We release the benchmark, data, code, and generated images to facilitate future research on culturally diverse T2I generation (https://github.com/longyuewangdcu/C3-Bench).

READ FULL TEXT

page 1

page 3

page 4

page 5

page 7

research
01/28/2023

Towards Equitable Representation in Text-to-Image Synthesis Models with the Cross-Cultural Understanding Benchmark (CCUB) Dataset

It has been shown that accurate representation in media improves the wel...
research
05/18/2023

X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models

This paper introduces a novel explainable image quality evaluation appro...
research
07/11/2023

TIAM – A Metric for Evaluating Alignment in Text-to-Image Generation

The progress in the generation of synthetic images has made it crucial t...
research
12/02/2021

TISE: A Toolbox for Text-to-Image Synthesis Evaluation

In this paper, we conduct a study on state-of-the-art methods for single...
research
02/10/2021

Culture-inspired Multi-modal Color Palette Generation and Colorization: A Chinese Youth Subculture Case

Color is an essential component of graphic design, acting not only as a ...
research
06/04/2023

Detector Guidance for Multi-Object Text-to-Image Generation

Diffusion models have demonstrated impressive performance in text-to-ima...
research
04/11/2023

HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models

In recent years, Text-to-Image (T2I) models have been extensively studie...

Please sign up or login with your details

Forgot password? Click here to reset