Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design

05/22/2023
by   Ibrahim Alabdulmohsin, et al.
0

Scaling laws have been recently employed to derive compute-optimal model size (number of parameters) for a given compute duration. We advance and refine such methods to infer compute-optimal model shapes, such as width and depth, and successfully implement this in vision transformers. Our shape-optimized vision transformer, SoViT, achieves results competitive with models that exceed twice its size, despite being pre-trained with an equivalent amount of compute. For example, SoViT-400m/14 achieves 90.3 surpassing the much larger ViT-g/14 and approaching ViT-G/14 under identical settings, with also less than half the inference cost. We conduct a thorough evaluation across multiple tasks, such as image classification, captioning, VQA and zero-shot transfer, demonstrating the effectiveness of our model across a broad range of domains and identifying limitations. Overall, our findings challenge the prevailing approach of blindly scaling up vision models and pave a path for a more informed scaling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2021

Scaling Vision Transformers

Attention-based neural networks such as the Vision Transformer (ViT) hav...
research
12/14/2022

Reproducible scaling laws for contrastive language-image learning

Scaling up neural networks has led to remarkable performance across a wi...
research
12/19/2022

The case for 4-bit precision: k-bit Inference Scaling Laws

Quantization methods reduce the number of bits required to represent eac...
research
10/28/2020

Scaling Laws for Autoregressive Generative Modeling

We identify empirical scaling laws for the cross-entropy loss in four do...
research
12/10/2022

Algorithmic progress in computer vision

We investigate algorithmic progress in image classification on ImageNet,...
research
04/07/2021

Scaling Scaling Laws with Board Games

The largest experiments in machine learning now require resources far be...
research
09/15/2023

Scaling Laws for Sparsely-Connected Foundation Models

We explore the impact of parameter sparsity on the scaling behavior of T...

Please sign up or login with your details

Forgot password? Click here to reset