A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation

03/16/2023
by   Hui Tang, et al.
4

Deep learning in computer vision has achieved great success with the price of large-scale labeled training data. However, exhaustive data annotation is impracticable for each task of all domains of interest, due to high labor costs and unguaranteed labeling accuracy. Besides, the uncontrollable data collection process produces non-IID training and test data, where undesired duplication may exist. All these nuisances may hinder the verification of typical theories and exposure to new findings. To circumvent them, an alternative is to generate synthetic data via 3D rendering with domain randomization. We in this work push forward along this line by doing profound and extensive research on bare supervised learning and downstream domain adaptation. Specifically, under the well-controlled, IID data setting enabled by 3D rendering, we systematically verify the typical, important learning insights, e.g., shortcut learning, and discover the new laws of various data regimes and network architectures in generalization. We further investigate the effect of image formation factors on generalization, e.g., object scale, material texture, illumination, camera viewpoint, and background in a 3D scene. Moreover, we use the simulation-to-reality adaptation as a downstream task for comparing the transferability between synthetic and real data when used for pre-training, which demonstrates that synthetic data pre-training is also promising to improve real test results. Lastly, to promote future research, we develop a new large-scale synthetic-to-real benchmark for image classification, termed S2RDA, which provides more significant challenges for transfer from simulation to reality. The code and datasets are available at https://github.com/huitangtang/On_the_Utility_of_Synthetic_Data.

READ FULL TEXT

page 3

page 6

page 7

page 8

page 17

page 18

page 19

page 20

research
10/28/2022

Self-Supervised Learning with Multi-View Rendering for 3D Point Cloud Analysis

Recently, great progress has been made in 3D deep learning with the emer...
research
11/30/2021

Task2Sim : Towards Effective Pre-training and Transfer from Synthetic Data

Pre-training models on Imagenet or other massive datasets of real images...
research
04/25/2020

StRDAN: Synthetic-to-Real Domain Adaptation Network for Vehicle Re-Identification

Vehicle re-identification aims to obtain the same vehicles from vehicle ...
research
03/22/2022

A Broad Study of Pre-training for Domain Generalization and Adaptation

Deep models must learn robust and transferable representations in order ...
research
03/26/2016

How useful is photo-realistic rendering for visual learning?

Data seems cheap to get, and in many ways it is, but the process of crea...
research
04/29/2022

Towards Automatic Parsing of Structured Visual Content through the Use of Synthetic Data

Structured Visual Content (SVC) such as graphs, flow charts, or the like...
research
07/14/2020

Automated Synthetic-to-Real Generalization

Models trained on synthetic images often face degraded generalization to...

Please sign up or login with your details

Forgot password? Click here to reset