Replacing Labeled Real-image Datasets with Auto-generated Contours

06/18/2022
by   Hirokatsu Kataoka, et al.
11

In the present work, we show that the performance of formula-driven supervised learning (FDSL) can match or even exceed that of ImageNet-21k without the use of real images, human-, and self-supervision during the pre-training of Vision Transformers (ViTs). For example, ViT-Base pre-trained on ImageNet-21k shows 81.8 FDSL shows 82.7 (number of images, hyperparameters, and number of epochs). Images generated by formulas avoid the privacy/copyright issues, labeling cost and errors, and biases that real images suffer from, and thus have tremendous potential for pre-training general models. To understand the performance of the synthetic images, we tested two hypotheses, namely (i) object contours are what matter in FDSL datasets and (ii) increased number of parameters to create labels affects performance improvement in FDSL pre-training. To test the former hypothesis, we constructed a dataset that consisted of simple object contour combinations. We found that this dataset can match the performance of fractals. For the latter hypothesis, we found that increasing the difficulty of the pre-training task generally leads to better fine-tuning accuracy.

READ FULL TEXT

page 2

page 3

page 5

page 6

research
07/27/2023

Pre-training Vision Transformers with Very Limited Synthesized Images

Formula-driven supervised learning (FDSL) is a pre-training method that ...
research
03/02/2023

Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

Formula-driven supervised learning (FDSL) has been shown to be an effect...
research
03/09/2023

Rethinking Self-Supervised Visual Representation Learning in Pre-training for 3D Human Pose and Shape Estimation

Recently, a few self-supervised representation learning (SSL) methods ha...
research
10/06/2021

Improving Fractal Pre-training

The deep neural networks used in modern computer vision systems require ...
research
12/13/2022

FastMIM: Expediting Masked Image Modeling Pre-training for Vision

The combination of transformers and masked image modeling (MIM) pre-trai...
research
01/17/2013

Knowledge Matters: Importance of Prior Information for Optimization

We explore the effect of introducing prior information into the intermed...
research
03/18/2020

Fixing the train-test resolution discrepancy: FixEfficientNet

This note complements the paper "Fixing the train-test resolution discre...

Please sign up or login with your details

Forgot password? Click here to reset