What does a platypus look like? Generating customized prompts for zero-shot image classification

09/07/2022
by   Sarah Pratt, et al.
3

Open vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically consists of a set of hand-written templates (e.g., "a photo of a ") which are completed with each of the category names. This work introduces a simple method to generate higher accuracy prompts, without using explicit knowledge of the image domain and with far fewer hand-constructed sentences. To achieve this, we combine open vocabulary models with large language models (LLMs) to create Customized Prompts via Language models (CuPL, pronounced "couple"). In particular, we leverage the knowledge contained in LLMs in order to generate many descriptive sentences that are customized for each object category. We find that this straightforward and general approach improves accuracy on a range of zero-shot image classification benchmarks, including over one percentage point gain on ImageNet. Finally, this method requires no additional training and remains completely zero-shot. Code is available at https://github.com/sarahpratt/CuPL.

READ FULL TEXT
research
02/06/2023

CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets

Open vocabulary models (e.g. CLIP) have shown strong performance on zero...
research
11/21/2022

PointCLIP V2: Adapting CLIP for Powerful 3D Open-world Learning

Contrastive Language-Image Pre-training (CLIP) has shown promising open-...
research
08/10/2022

Patching open-vocabulary models by interpolating weights

Open-vocabulary models like CLIP achieve high accuracy across many image...
research
10/13/2022

Visual Classification via Description from Large Language Models

Vision-language models (VLMs) such as CLIP have shown promising performa...
research
05/26/2023

Building One-class Detector for Anything: Open-vocabulary Zero-shot OOD Detection Using Text-image Models

We focus on the challenge of out-of-distribution (OOD) detection in deep...
research
04/22/2022

iCAR: Bridging Image Classification and Image-text Alignment for Visual Recognition

Image classification, which classifies images by pre-defined categories,...
research
04/22/2023

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

Large language models (LLMs) have demonstrated remarkable zero-shot gene...

Please sign up or login with your details

Forgot password? Click here to reset