I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors

05/24/2023
by   Tuhin Chakrabarty, et al.
0

Visual metaphors are powerful rhetorical devices used to persuade or communicate creative ideas through images. Similar to linguistic metaphors, they convey meaning implicitly through symbolism and juxtaposition of the symbols. We propose a new task of generating visual metaphors from linguistic metaphors. This is a challenging task for diffusion-based text-to-image models, such as DALL·E 2, since it requires the ability to model implicit meaning and compositionality. We propose to solve the task through the collaboration between Large Language Models (LLMs) and Diffusion Models: Instruct GPT-3 (davinci-002) with Chain-of-Thought prompting generates text that represents a visual elaboration of the linguistic metaphor containing the implicit meaning and relevant objects, which is then used as input to the diffusion-based text-to-image models.Using a human-AI collaboration framework, where humans interact both with the LLM and the top-performing diffusion model, we create a high-quality dataset containing 6,476 visual metaphors for 1,540 linguistic metaphors and their associated visual elaborations. Evaluation by professional illustrators shows the promise of LLM-Diffusion Model collaboration for this task.To evaluate the utility of our Human-AI collaboration framework and the quality of our dataset, we perform both an intrinsic human-based evaluation and an extrinsic evaluation using visual entailment as a downstream task.

READ FULL TEXT

page 1

page 5

page 7

page 9

page 14

page 15

page 16

page 17

research
05/18/2023

X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models

This paper introduces a novel explainable image quality evaluation appro...
research
06/21/2023

Solving and Generating NPR Sunday Puzzles with Large Language Models

We explore the ability of large language models to solve and generate pu...
research
05/23/2022

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

We present Imagen, a text-to-image diffusion model with an unprecedented...
research
05/25/2023

Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models

Text-to-image (T2I) research has grown explosively in the past year, owi...
research
06/11/2023

A blind spot for large language models: Supradiegetic linguistic information

Large Language Models (LLMs) like ChatGPT reflect profound changes in th...
research
12/15/2022

TeTIm-Eval: a novel curated evaluation data set for comparing text-to-image models

Evaluating and comparing text-to-image models is a challenging problem. ...
research
01/27/2023

Diffusion Models as Artists: Are we Closing the Gap between Humans and Machines?

An important milestone for AI is the development of algorithms that can ...

Please sign up or login with your details

Forgot password? Click here to reset