Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)

05/03/2022
by   Alex Fang, et al.
10

Contrastively trained image-text models such as CLIP, ALIGN, and BASIC have demonstrated unprecedented robustness to multiple challenging natural distribution shifts. Since these image-text models differ from previous training approaches in several ways, an important question is what causes the large robustness gains. We answer this question via a systematic experimental investigation. Concretely, we study five different possible causes for the robustness gains: (i) the training set size, (ii) the training distribution, (iii) language supervision at training time, (iv) language supervision at test time, and (v) the contrastive loss function. Our experiments show that the more diverse training distribution is the main cause for the robustness gains, with the other factors contributing little to no robustness. Beyond our experimental results, we also introduce ImageNet-Captions, a version of ImageNet with original text annotations from Flickr, to enable further controlled experiments of language-image training.

READ FULL TEXT

page 6

page 16

page 17

research
07/01/2020

Measuring Robustness to Natural Distribution Shifts in Image Classification

We study how robust current ImageNet models are to distribution shifts a...
research
02/02/2023

Effective Robustness against Natural Distribution Shifts for Models with Different Training Data

“Effective robustness” measures the extra out-of-distribution (OOD) robu...
research
05/31/2023

Improving CLIP Training with Language Rewrites

Contrastive Language-Image Pre-training (CLIP) stands as one of the most...
research
01/19/2023

Self Supervision Does Not Help Natural Language Supervision at Scale

Self supervision and natural language supervision have emerged as two ex...
research
08/10/2022

Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP

Web-crawled datasets have enabled remarkable generalization capabilities...
research
06/17/2021

Poisoning and Backdooring Contrastive Learning

Contrastive learning methods like CLIP train on noisy and uncurated trai...
research
11/19/2021

Combined Scaling for Zero-shot Transfer Learning

We present a combined scaling method called BASIC that achieves 85.7 zer...

Please sign up or login with your details

Forgot password? Click here to reset