DeepAI AI Chat
Log In Sign Up

Towards Reliable Zero Shot Classification in Self-Supervised Models with Conformal Prediction

by   Bhawesh Kumar, et al.
Harvard University

Self-supervised models trained with a contrastive loss such as CLIP have shown to be very powerful in zero-shot classification settings. However, to be used as a zero-shot classifier these models require the user to provide new captions over a fixed set of labels at test time. In many settings, it is hard or impossible to know if a new query caption is compatible with the source captions used to train the model. We address these limitations by framing the zero-shot classification task as an outlier detection problem and develop a conformal prediction procedure to assess when a given test caption may be reliably used. On a real-world medical example, we show that our proposed conformal procedure improves the reliability of CLIP-style models in the zero-shot classification setting, and we provide an empirical analysis of the factors that may affect its performance.


A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision

Using natural language as a supervision for training visual recognition ...

Contrastive Training Improves Zero-Shot Classification of Semi-structured Documents

We investigate semi-structured document classification in a zero-shot se...

FROB: Few-shot ROBust Model for Classification and Out-of-Distribution Detection

Nowadays, classification and Out-of-Distribution (OoD) detection in the ...

Cognitively Aided Zero-Shot Automatic Essay Grading

Automatic essay grading (AEG) is a process in which machines assign a gr...

Improving Zero-Shot Models with Label Distribution Priors

Labeling large image datasets with attributes such as facial age or obje...

Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment

Reinforcement learning from large-scale offline datasets provides us wit...

Data Efficient Language-supervised Zero-shot Recognition with Optimal Transport Distillation

Traditional computer vision models are trained to predict a fixed set of...