Patching open-vocabulary models by interpolating weights

08/10/2022
by   Gabriel Ilharco, et al.
10

Open-vocabulary models like CLIP achieve high accuracy across many image classification tasks. However, there are still settings where their zero-shot performance is far from optimal. We study model patching, where the goal is to improve accuracy on specific tasks without degrading accuracy on tasks where performance is already adequate. Towards this goal, we introduce PAINT, a patching method that uses interpolations between the weights of a model before fine-tuning and the weights after fine-tuning on a task to be patched. On nine tasks where zero-shot CLIP performs poorly, PAINT increases accuracy by 15 to 60 percentage points while preserving accuracy on ImageNet within one percentage point of the zero-shot model. PAINT also allows a single model to be patched on multiple tasks and improves with model scale. Furthermore, we identify cases of broad transfer, where patching on one task increases accuracy on other tasks even when the tasks have disjoint classes. Finally, we investigate applications beyond common benchmarks such as counting or reducing the impact of typographic attacks on CLIP. Our findings demonstrate that it is possible to expand the set of tasks on which open-vocabulary models achieve high accuracy without re-training them from scratch.

READ FULL TEXT

page 9

page 37

page 38

page 42

research
09/04/2021

Robust fine-tuning of zero-shot models

Large pre-trained models such as CLIP offer consistent accuracy across a...
research
09/07/2022

What does a platypus look like? Generating customized prompts for zero-shot image classification

Open vocabulary models are a promising new paradigm for image classifica...
research
10/19/2022

Continued Pretraining for Better Zero- and Few-Shot Promptability

Recently introduced language model prompting methods can achieve high ac...
research
04/07/2022

Learning to Compose Soft Prompts for Compositional Zero-Shot Learning

We introduce compositional soft prompting (CSP), a parameter-efficient l...
research
02/13/2023

A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models

Contrastively trained text-image models have the remarkable ability to p...
research
07/16/2023

Is Imitation All You Need? Generalized Decision-Making with Dual-Phase Training

We introduce DualMind, a generalist agent designed to tackle various dec...
research
04/22/2022

iCAR: Bridging Image Classification and Image-text Alignment for Visual Recognition

Image classification, which classifies images by pre-defined categories,...

Please sign up or login with your details

Forgot password? Click here to reset