Synthetic Examples Improve Generalization for Rare Classes

by   Sara Beery, et al.

The ability to detect and classify rare occurrences in images has important applications - for example, counting rare and endangered species when studying biodiversity, or detecting infrequent traffic scenarios that pose a danger to self-driving cars. Few-shot learning is an open problem: current computer vision systems struggle to categorize objects they have seen only rarely during training, and collecting a sufficient number of training examples of rare events is often challenging and expensive, and sometimes outright impossible. We explore in depth an approach to this problem: complementing the few available training images with ad-hoc simulated data. Our testbed is animal species classification, which has a real-world long-tailed distribution. We analyze the effect of different axes of variation in simulation, such as pose, lighting, model, and simulation method, and we prescribe best practices for efficiently incorporating simulated data for real-world performance gain. Our experiments reveal that synthetic data can considerably reduce error rates for classes that are rare, that as the amount of simulated data is increased, accuracy on the target class improves, and that high variation of simulated data provides maximum performance gain.


page 2

page 4

page 7

page 12

page 13

page 14

page 15


Image-to-Image Translation of Synthetic Samples for Rare Classes

The natural world is long-tailed: rare classes are observed orders of ma...

Diffusion Dataset Generation: Towards Closing the Sim2Real Gap for Pedestrian Detection

We propose a method that augments a simulated dataset using diffusion mo...

Road images augmentation with synthetic traffic signs using neural networks

Traffic sign recognition is a well-researched problem in computer vision...

Generalized Few-Shot 3D Object Detection of LiDAR Point Cloud for Autonomous Driving

Recent years have witnessed huge successes in 3D object detection to rec...

Style Transfer Enabled Sim2Real Framework for Efficient Learning of Robotic Ultrasound Image Analysis Using Simulated Data

Robotic ultrasound (US) systems have shown great potential to make US ex...

Synthetic training data generation for deep learning based quality inspection

Deep learning is now the gold standard in computer vision-based quality ...

Fusion of Range and Thermal Images for Person Detection

Detecting people in images is a challenging problem. Differences in pose...

Please sign up or login with your details

Forgot password? Click here to reset