Log In Sign Up

Robust fine-tuning of zero-shot models

by   Mitchell Wortsman, et al.

Large pre-trained models such as CLIP offer consistent accuracy across a range of data distributions when performing zero-shot inference (i.e., without fine-tuning on a specific dataset). Although existing fine-tuning approaches substantially improve accuracy in-distribution, they also reduce out-of-distribution robustness. We address this tension by introducing a simple and effective method for improving robustness: ensembling the weights of the zero-shot and fine-tuned models. Compared to standard fine-tuning, the resulting weight-space ensembles provide large accuracy improvements out-of-distribution, while matching or improving in-distribution accuracy. On ImageNet and five derived distribution shifts, weight-space ensembles improve out-of-distribution accuracy by 2 to 10 percentage points while increasing in-distribution accuracy by nearly 1 percentage point relative to standard fine-tuning. These improvements come at no additional computational cost during fine-tuning or inference.


page 4

page 28

page 39

page 40

page 41

page 42


Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning

Large pre-trained, zero-shot capable models have shown considerable succ...

Patching open-vocabulary models by interpolating weights

Open-vocabulary models like CLIP achieve high accuracy across many image...

Context-Aware Robust Fine-Tuning

Contrastive Language-Image Pre-trained (CLIP) models have zero-shot abil...

Exploring The Landscape of Distributional Robustness for Question Answering Models

We conduct a large empirical evaluation to investigate the landscape of ...

Amortized Prompt: Lightweight Fine-Tuning for CLIP in Domain Generalization

Domain generalization (DG) is a difficult transfer learning problem aimi...

The Evolution of Out-of-Distribution Robustness Throughout Fine-Tuning

Although machine learning models typically experience a drop in performa...