A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models

02/13/2023
by   James Urquhart Allingham, et al.
0

Contrastively trained text-image models have the remarkable ability to perform zero-shot classification, that is, classifying previously unseen images into categories that the model has never been explicitly trained to identify. However, these zero-shot classifiers need prompt engineering to achieve high accuracy. Prompt engineering typically requires hand-crafting a set of prompts for individual downstream tasks. In this work, we aim to automate this prompt engineering and improve zero-shot accuracy through prompt ensembling. In particular, we ask "Given a large pool of prompts, can we automatically score the prompts and ensemble those that are most suitable for a particular downstream dataset, without needing access to labeled validation data?". We demonstrate that this is possible. In doing so, we identify several pathologies in a naive prompt scoring method where the score can be easily overconfident due to biases in pre-training and test data, and we propose a novel prompt scoring method that corrects for the biases. Using our proposed scoring method to create a weighted average prompt ensemble, our method outperforms equal average ensemble, as well as hand-crafted prompts, on ImageNet, 4 of its variants, and 11 fine-grained classification benchmarks, all while being fully automatic, optimization-free, and not requiring access to labeled validation data.

READ FULL TEXT
research
02/02/2023

CLIPood: Generalizing CLIP to Out-of-Distributions

Out-of-distribution (OOD) generalization, where the model needs to handl...
research
06/07/2022

Masked Unsupervised Self-training for Zero-shot Image Classification

State-of-the-art computer vision models are mostly trained with supervis...
research
03/27/2023

Text-to-Image Diffusion Models are Zero-Shot Classifiers

The excellent generative capabilities of text-to-image diffusion models ...
research
08/10/2022

Patching open-vocabulary models by interpolating weights

Open-vocabulary models like CLIP achieve high accuracy across many image...
research
04/12/2022

ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension

Training a referring expression comprehension (ReC) model for a new visu...
research
01/20/2023

Matching Exemplar as Next Sentence Prediction (MeNSP): Zero-shot Prompt Learning for Automatic Scoring in Science Education

Developing models to automatically score students' written responses to ...
research
02/20/2022

Mining Robust Default Configurations for Resource-constrained AutoML

Automatic machine learning (AutoML) is a key enabler of the mass deploym...

Please sign up or login with your details

Forgot password? Click here to reset