No Token Left Behind: Explainability-Aided Image Classification and Generation

04/11/2022
by   Roni Paiss, et al.
0

The application of zero-shot learning in computer vision has been revolutionized by the use of image-text matching models. The most notable example, CLIP, has been widely used for both zero-shot classification and guiding generative models with a text prompt. However, the zero-shot use of CLIP is unstable with respect to the phrasing of the input text, making it necessary to carefully engineer the prompts used. We find that this instability stems from a selective similarity score, which is based only on a subset of the semantically meaningful input tokens. To mitigate it, we present a novel explainability-based approach, which adds a loss term to ensure that CLIP focuses on all relevant semantic parts of the input, in addition to employing the CLIP similarity loss used in previous works. When applied to one-shot classification through prompt engineering, our method yields an improvement in the recognition rate, without additional training or fine-tuning. Additionally, we show that CLIP guidance of generative models using our method significantly improves the generated images. Finally, we demonstrate a novel use of CLIP guidance for text-based image generation with spatial conditioning on object location, by requiring the image explainability heatmap for each object to be confined to a pre-determined bounding box.

READ FULL TEXT

page 10

page 23

page 28

page 29

page 31

page 32

page 33

page 34

research
06/23/2023

Zero-shot spatial layout conditioning for text-to-image diffusion models

Large-scale text-to-image diffusion models have significantly improved t...
research
06/21/2021

STEP-EZ: Syntax Tree guided semantic ExPlanation for Explainable Zero-shot modeling of clinical depression symptoms from text

We focus on exploring various approaches of Zero-Shot Learning (ZSL) and...
research
02/24/2021

Zero-Shot Text-to-Image Generation

Text-to-image generation has traditionally focused on finding better mod...
research
09/27/2022

What Does DALL-E 2 Know About Radiology?

Generative models such as DALL-E 2 could represent a promising future to...
research
03/20/2022

CLIP on Wheels: Zero-Shot Object Navigation as Object Localization and Exploration

Households across the world contain arbitrary objects: from mate gourds ...
research
03/24/2022

Text to Mesh Without 3D Supervision Using Limit Subdivision

We present a technique for zero-shot generation of a 3D model using only...
research
07/17/2023

Zero-Shot Image Harmonization with Generative Model Prior

Recent image harmonization methods have demonstrated promising results. ...

Please sign up or login with your details

Forgot password? Click here to reset