CLIPA-v2: Scaling CLIP Training with 81.1 within a $10,000 Budget; An Extra $4,000 Unlocks 81.8

06/27/2023
by   Xianhang Li, et al.
0

The recent work CLIPA presents an inverse scaling law for CLIP training – whereby the larger the image/text encoders used, the shorter the sequence length of image/text tokens that can be applied in training. This finding enables us to train high-performance CLIP models with significantly reduced computations. Building upon this work, we hereby present CLIPA-v2 with two key contributions. Technically, we find this inverse scaling law is also applicable in the finetuning stage, enabling further reduction in computational needs. Empirically, we explore CLIPA at scale, extending the experiments up to the H/14 model with  13B image-text pairs seen during training. Our results are exciting – by only allocating a budget of $10,000, our CLIP model achieves an impressive zero-shot ImageNet accuracy of 81.1 the prior best CLIP model (from OpenCLIP, 80.1 the computational cost by  39X. Moreover, with an additional investment of 4,000, we can further elevate the zero-shot ImageNet accuracy to 81.8 code and models are available at https://github.com/UCSC-VLAA/CLIPA.

READ FULL TEXT
research
05/11/2023

An Inverse Scaling Law for CLIP Training

CLIP, the first foundation model that connects images and text, has enab...
research
11/19/2021

Combined Scaling for Zero-shot Transfer Learning

We present a combined scaling method called BASIC that achieves 85.7 zer...
research
03/27/2023

EVA-CLIP: Improved Training Techniques for CLIP at Scale

Contrastive language-image pre-training, CLIP for short, has gained incr...
research
07/07/2021

Can Transformer Models Measure Coherence In Text? Re-Thinking the Shuffle Test

The Shuffle Test is the most common task to evaluate whether NLP models ...
research
04/27/2023

DataComp: In search of the next generation of multimodal datasets

Large multimodal datasets have been instrumental in recent breakthroughs...
research
10/26/2022

Broken Neural Scaling Laws

We present a smoothly broken power law functional form that accurately m...
research
02/24/2022

The rise of the lottery heroes: why zero-shot pruning is hard

Recent advances in deep learning optimization showed that just a subset ...

Please sign up or login with your details

Forgot password? Click here to reset