Improving Zero-Shot Models with Label Distribution Priors

12/01/2022
by   Jonathan Kahana, et al.
0

Labeling large image datasets with attributes such as facial age or object type is tedious and sometimes infeasible. Supervised machine learning methods provide a highly accurate solution, but require manual labels which are often unavailable. Zero-shot models (e.g., CLIP) do not require manual labels but are not as accurate as supervised ones, particularly when the attribute is numeric. We propose a new approach, CLIPPR (CLIP with Priors), which adapts zero-shot models for regression and classification on unlabelled datasets. Our method does not use any annotated images. Instead, we assume a prior over the label distribution in the dataset. We then train an adapter network on top of CLIP under two competing objectives: i) minimal change of predictions from the original CLIP model ii) minimal distance between predicted and prior distribution of labels. Additionally, we present a novel approach for selecting prompts for Vision Language models using a distributional prior. Our method is effective and presents a significant improvement over the original model. We demonstrate an improvement of 28 regression task. We also present promising results for classification benchmarks, improving the classification accuracy on the ImageNet dataset by 2.83

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/03/2023

The Benefits of Label-Description Training for Zero-Shot Text Classification

Large language models have improved zero-shot text classification by all...
research
09/12/2022

VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models

Vision-language models trained on large, randomly collected data had sig...
research
02/06/2023

CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets

Open vocabulary models (e.g. CLIP) have shown strong performance on zero...
research
07/24/2023

Leveraging Label Variation in Large Language Models for Zero-Shot Text Classification

The zero-shot learning capabilities of large language models (LLMs) make...
research
05/29/2023

LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections

Recently, large-scale pre-trained Vision and Language (VL) models have s...
research
06/13/2023

GeneCIS: A Benchmark for General Conditional Image Similarity

We argue that there are many notions of 'similarity' and that models, li...
research
06/04/2023

ProTeCt: Prompt Tuning for Hierarchical Consistency

Large visual-language models, like CLIP, learn generalized representatio...

Please sign up or login with your details

Forgot password? Click here to reset