Scaling Vision Transformers

06/08/2021
by   Xiaohua Zhai, et al.
9

Attention-based neural networks such as the Vision Transformer (ViT) have recently attained state-of-the-art results on many computer vision benchmarks. Scale is a primary ingredient in attaining excellent results, therefore, understanding a model's scaling properties is a key to designing future generations effectively. While the laws for scaling Transformer language models have been studied, it is unknown how Vision Transformers scale. To address this, we scale ViT models and data, both up and down, and characterize the relationships between error rate, data, and compute. Along the way, we refine the architecture and training of ViT, reducing memory consumption and increasing accuracy the resulting models. As a result, we successfully train a ViT model with two billion parameters, which attains a new state-of-the-art on ImageNet of 90.45 learning, for example, attaining 84.86 examples per class.

READ FULL TEXT

page 5

page 8

research
05/21/2022

Vision Transformers in 2022: An Update on Tiny ImageNet

The recent advances in image transformers have shown impressive results ...
research
05/22/2023

Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design

Scaling laws have been recently employed to derive compute-optimal model...
research
05/22/2022

Dynamic Query Selection for Fast Visual Perceiver

Transformers have been matching deep convolutional networks for vision a...
research
02/06/2023

Computation vs. Communication Scaling for Future Transformers on Future Hardware

Scaling neural network models has delivered dramatic quality gains acros...
research
08/11/2023

Composable Function-preserving Expansions for Transformer Architectures

Training state-of-the-art neural networks requires a high cost in terms ...
research
07/21/2022

Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?

There have been a lot of interest in the scaling properties of Transform...
research
07/22/2022

Applying Spatiotemporal Attention to Identify Distracted and Drowsy Driving with Vision Transformers

A 20 result of increased distraction and drowsiness. Drowsy and distract...

Please sign up or login with your details

Forgot password? Click here to reset