DeepAI
Log In Sign Up

Robust fine-tuning of zero-shot models

09/04/2021
by   Mitchell Wortsman, et al.
7

Large pre-trained models such as CLIP offer consistent accuracy across a range of data distributions when performing zero-shot inference (i.e., without fine-tuning on a specific dataset). Although existing fine-tuning approaches substantially improve accuracy in-distribution, they also reduce out-of-distribution robustness. We address this tension by introducing a simple and effective method for improving robustness: ensembling the weights of the zero-shot and fine-tuned models. Compared to standard fine-tuning, the resulting weight-space ensembles provide large accuracy improvements out-of-distribution, while matching or improving in-distribution accuracy. On ImageNet and five derived distribution shifts, weight-space ensembles improve out-of-distribution accuracy by 2 to 10 percentage points while increasing in-distribution accuracy by nearly 1 percentage point relative to standard fine-tuning. These improvements come at no additional computational cost during fine-tuning or inference.

READ FULL TEXT

page 4

page 28

page 39

page 40

page 41

page 42

11/06/2022

Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning

Large pre-trained, zero-shot capable models have shown considerable succ...
08/10/2022

Patching open-vocabulary models by interpolating weights

Open-vocabulary models like CLIP achieve high accuracy across many image...
11/29/2022

Context-Aware Robust Fine-Tuning

Contrastive Language-Image Pre-trained (CLIP) models have zero-shot abil...
10/22/2022

Exploring The Landscape of Distributional Robustness for Question Answering Models

We conduct a large empirical evaluation to investigate the landscape of ...
11/25/2021

Amortized Prompt: Lightweight Fine-Tuning for CLIP in Domain Generalization

Domain generalization (DG) is a difficult transfer learning problem aimi...
06/30/2021

The Evolution of Out-of-Distribution Robustness Throughout Fine-Tuning

Although machine learning models typically experience a drop in performa...