Pre-training Vision Transformers with Very Limited Synthesized Images

07/27/2023
by   Ryo Nakamura, et al.
0

Formula-driven supervised learning (FDSL) is a pre-training method that relies on synthetic images generated from mathematical formulae such as fractals. Prior work on FDSL has shown that pre-training vision transformers on such synthetic datasets can yield competitive accuracy on a wide range of downstream tasks. These synthetic images are categorized according to the parameters in the mathematical formula that generate them. In the present work, we hypothesize that the process for generating different instances for the same category in FDSL, can be viewed as a form of data augmentation. We validate this hypothesis by replacing the instances with data augmentation, which means we only need a single image per category. Our experiments shows that this one-instance fractal database (OFDB) performs better than the original dataset where instances were explicitly generated. We further scale up OFDB to 21,000 categories and show that it matches, or even surpasses, the model pre-trained on ImageNet-21k in ImageNet-1k fine-tuning. The number of images in OFDB is 21k, whereas ImageNet-21k has 14M. This opens new possibilities for pre-training vision transformers with much smaller datasets.

READ FULL TEXT

page 1

page 4

page 7

research
06/18/2022

Replacing Labeled Real-image Datasets with Auto-generated Contours

In the present work, we show that the performance of formula-driven supe...
research
03/02/2023

Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

Formula-driven supervised learning (FDSL) has been shown to be an effect...
research
11/30/2021

Task2Sim : Towards Effective Pre-training and Transfer from Synthetic Data

Pre-training models on Imagenet or other massive datasets of real images...
research
11/30/2021

Beyond Flatland: Pre-training with a Strong 3D Inductive Bias

Pre-training on large-scale databases consisting of natural images and t...
research
03/31/2023

Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?

We present the largest and most comprehensive empirical study of pre-tra...
research
12/01/2020

Denoising Pre-Training and Data Augmentation Strategies for Enhanced RDF Verbalization with Transformers

The task of verbalization of RDF triples has known a growth in popularit...
research
06/12/2023

Unmasking Deepfakes: Masked Autoencoding Spatiotemporal Transformers for Enhanced Video Forgery Detection

We present a novel approach for the detection of deepfake videos using a...

Please sign up or login with your details

Forgot password? Click here to reset