An Inverse Scaling Law for CLIP Training

05/11/2023
by   Xianhang Li, et al.
0

CLIP, the first foundation model that connects images and text, has enabled many recent breakthroughs in computer vision. However, its associated training cost is prohibitively high, imposing a significant barrier to its widespread exploration. In this paper, we present a surprising finding that there exists an inverse scaling law for CLIP training, whereby the larger the image/text encoders used, the shorter the sequence length of image/text tokens that can be applied in training. Moreover, we showcase that the strategy for reducing image/text token length plays a crucial role in determining the quality of this scaling law. As a result of this finding, we are able to successfully train CLIP even by using academic resources. For example, on an A100 eight-GPU server, our CLIP models achieve zero-shot top-1 ImageNet accuracies of 63.2 in  3 days, and 69.3 associated with CLIP, we hope to inspire more research in this field, particularly from academics. Our code is available at https://github.com/UCSC-VLAA/CLIPA.

READ FULL TEXT
research
06/27/2023

CLIPA-v2: Scaling CLIP Training with 81.1 within a $10,000 Budget; An Extra $4,000 Unlocks 81.8

The recent work CLIPA presents an inverse scaling law for CLIP training ...
research
12/14/2022

Reproducible scaling laws for contrastive language-image learning

Scaling up neural networks has led to remarkable performance across a wi...
research
07/22/2022

Zero-Shot Video Captioning with Evolving Pseudo-Tokens

We introduce a zero-shot video captioning method that employs two frozen...
research
10/18/2022

MedCLIP: Contrastive Learning from Unpaired Medical Images and Text

Existing vision-text contrastive learning like CLIP aims to match the pa...
research
10/26/2022

Broken Neural Scaling Laws

We present a smoothly broken power law functional form that accurately m...
research
06/14/2023

Generate to Understand for Representation

In recent years, a significant number of high-quality pretrained models ...
research
02/24/2022

Auto-scaling Vision Transformers without Training

This work targets automated designing and scaling of Vision Transformers...

Please sign up or login with your details

Forgot password? Click here to reset