Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

03/02/2023
by   Sora Takashima, et al.
0

Formula-driven supervised learning (FDSL) has been shown to be an effective method for pre-training vision transformers, where ExFractalDB-21k was shown to exceed the pre-training effect of ImageNet-21k. These studies also indicate that contours mattered more than textures when pre-training vision transformers. However, the lack of a systematic investigation as to why these contour-oriented synthetic datasets can achieve the same accuracy as real datasets leaves much room for skepticism. In the present work, we develop a novel methodology based on circular harmonics for systematically investigating the design space of contour-oriented synthetic datasets. This allows us to efficiently search the optimal range of FDSL parameters and maximize the variety of synthetic images in the dataset, which we found to be a critical factor. When the resulting new dataset VisualAtom-21k is used for pre-training ViT-Base, the top-1 accuracy reached 83.7 This is close to the top-1 accuracy (84.2 while the number of images is 1/14. Unlike JFT-300M which is a static dataset, the quality of synthetic datasets will continue to improve, and the current work is a testament to this possibility. FDSL is also free of the common issues associated with real images, e.g. privacy/copyright issues, labeling costs/errors, and ethical biases.

READ FULL TEXT

page 1

page 5

page 6

research
06/18/2022

Replacing Labeled Real-image Datasets with Auto-generated Contours

In the present work, we show that the performance of formula-driven supe...
research
07/27/2023

Pre-training Vision Transformers with Very Limited Synthesized Images

Formula-driven supervised learning (FDSL) is a pre-training method that ...
research
03/24/2021

Can Vision Transformers Learn without Natural Images?

Can we complete pre-training of Vision Transformers (ViT) without natura...
research
12/12/2022

You Only Need a Good Embeddings Extractor to Fix Spurious Correlations

Spurious correlations in training data often lead to robustness issues s...
research
11/24/2021

PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers

This paper explores a better codebook for BERT pre-training of vision tr...
research
07/11/2022

PSP-HDRI+: A Synthetic Dataset Generator for Pre-Training of Human-Centric Computer Vision Models

We introduce a new synthetic data generator PSP-HDRI+ that proves to be ...
research
11/29/2022

Procedural Image Programs for Representation Learning

Learning image representations using synthetic data allows training neur...

Please sign up or login with your details

Forgot password? Click here to reset