ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models

06/07/2023
by   Maitreya Patel, et al.
0

The ability to understand visual concepts and replicate and compose these concepts from images is a central goal for computer vision. Recent advances in text-to-image (T2I) models have lead to high definition and realistic image quality generation by learning from large databases of images and their descriptions. However, the evaluation of T2I models has focused on photorealism and limited qualitative measures of visual understanding. To quantify the ability of T2I models in learning and synthesizing novel visual concepts, we introduce ConceptBed, a large-scale dataset that consists of 284 unique visual concepts, 5K unique concept compositions, and 33K composite text prompts. Along with the dataset, we propose an evaluation metric, Concept Confidence Deviation (CCD), that uses the confidence of oracle concept classifiers to measure the alignment between concepts generated by T2I generators and concepts contained in ground truth images. We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions. Our human study shows that CCD is highly correlated with human understanding of concepts. Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome.

READ FULL TEXT

page 3

page 7

page 8

page 22

page 23

page 24

research
03/23/2023

Ablating Concepts in Text-to-Image Diffusion Models

Large-scale text-to-image diffusion models can generate high-fidelity im...
research
06/03/2022

Compositional Visual Generation with Composable Diffusion Models

Large text-guided diffusion models, such as DALLE-2, are able to generat...
research
09/08/2023

Create Your World: Lifelong Text-to-Image Diffusion

Text-to-image generative models can produce diverse high-quality images ...
research
10/31/2019

Text-to-image synthesis method evaluation based on visual patterns

A commonly used evaluation metric for text-to-image synthesis is the Inc...
research
04/13/2020

Compositional Visual Generation and Inference with Energy Based Models

A vital aspect of human intelligence is the ability to compose increasin...
research
04/24/2023

Improving Synthetically Generated Image Detection in Cross-Concept Settings

New advancements for the detection of synthetic images are critical for ...
research
11/22/2022

Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark

We provide a new multi-task benchmark for evaluating text-to-image model...

Please sign up or login with your details

Forgot password? Click here to reset