Data Generation for Learning to Grasp in a Bin-picking Scenario

by   Yiting Chen, et al.
Wuhan University

The rise of deep learning has greatly transformed the pipeline of robotic grasping from model-based approach to data-driven stream. Along this line, a large scale of grasping data either collected from simulation or from real world examples become extremely important. In this paper, we present our recent work on data generation in simulation for a bin-picking scene. 77 objects from the YCB object data sets are used to generate the dataset with PyBullet, where different environment conditions are taken into account including lighting, camera pose, sensor noise and so on. In all, 100K data samples are collected in terms of ground truth segmentation, RGB, 6D pose and point cloud. All the data examples including the source code are made available online.



There are no comments yet.


page 2

page 3


Unknown Object Segmentation through Domain Adaptation

The ability to segment unknown objects in cluttered scenes has a profoun...

GraspNet: A Large-Scale Clustered and Densely Annotated Dataset for Object Grasping

Object grasping is critical for many applications, which is also a chall...

GraspNet: A Large-Scale Clustered and Densely Annotated Datase for Object Grasping

Object grasping is critical for many applications, which is also a chall...

Real-time Fruit Recognition and Grasp Estimation for Autonomous Apple harvesting

In this research, a fully neural network based visual perception framewo...

SuctionNet-1Billion: A Large-Scale Benchmark for Suction Grasping

Suction is an important solution for the longstanding robotic grasping p...

From a Point Cloud to a Simulation Model: Bayesian Segmentation and Entropy based Uncertainty Estimation for 3D Modelling

The 3D modelling of indoor environments and the generation of process si...

Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias

Data-driven approaches to solving robotic tasks have gained a lot of tra...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

If we take a short look at recent years pose estimation and object location methods, data driven takes up an increasing proportion, such as CullNet

[2], DenseFusion [3]

. These methods reveal reliable ways to estimate the 6D pose of objects, and of course out there are still many examples like this. With the help of large scale of data, the time to learn pose estimation or grasping has been significantly shortened.

Ii Building Dataset

Ii-a System Setup

  • Simulation:We choose PyBullet as our simulator, which provides real-time collision detection and multi-physics simulation for VR, games, visual effects, robotics, machine learning etc..

  • Objects: All of our objects are selected from YCB-Dataset[1], which provides nearly hundred kinds of texture-mapped 3D mesh models. Figure 1 shows 77 different kinds of objects.

Fig. 1: 77 different kinds of objects.

Ii-B Data Generation

The virtual environment we designed is to place an empty tray box in the middle of a blank plane and the camera 0.7 meters above the tray box. There are 77 different kinds of models in our dataset, which are all selected from YCB dataset. We set a blank space of 0.4*0.4*0.45 cubic meters, and make it 0.05 meters right above the tray box. Each time we randomly selected 12 different kinds of models to appear from random positions above the box, every single object’s x,y,z parameters were generated randomly within the size of the blank space. Figure 2 shows the situation when 12 objects came out, which are sugar-box, g-cup, mug, sponge, a-colored-wood-blocks, c-lego-duplo, g-lego-duplo, scissors, large-marker, fork, h-cups, tennis-ball.

Fig. 2: 12 different objects appear randomly in the blank space above.

As soon as we turned on gravity, the objects would naturally fall into the tray box. Due to the collision, the poses of each objects were naturally randomly generated, so that the stacking states of objects were very similar to the real world situation. Figure 3 shows the situation after falling. For each falling case, the lighting of the scene comes from a point light that will constantly change its angle, which means we could obtain nearly every lighting situation that is possible in the real world.

Fig. 3: 12 objects fall from above and become stable after collision.

Ii-C Simulation Result

Thanks to the powerful build-in function from PyBullet, we could easily get segmentation, depth and RGB images of our tray box. Figure 4 shows the 3 kinds of images and point cloud we get.All images are saved as .png file, point cloud is saved as .ply file.

Fig. 4: Segmentation, RGB, Depth and Point Cloud from top to bottom.

Figure 5 show cases part of our simulation result. 6D Poses of each object falling case are saved as .csv file, we describe the 6D Poses by quaternion.

Fig. 5: Part of our result, which contains images from 1200 groups of data

Iii Conclusion

We present a new dataset with point cloud, 6D pose ,segmentation,depth and RGB created using the PyBullet. This dataset includes 77 kinds of YCB models and includes random collision, lighting variations. Our Dataset contains 100k groups of data and provides significantly lots of parameter variations. In the future, we are planning to validate the effectiveness of this dataset using real world object examples.The website for the data generation procedure is available online as


  • [1] B. Calli, A. Singh, A. Walsman, S. Srinivasa, P. Abbeel, and A. M. Dollar (2015) The ycb object and model set: towards common benchmarks for manipulation research. In 2015 International Conference on Advanced Robotics (ICAR), Vol. , pp. 510–517. Cited by: 2nd item.
  • [2] K. Gupta, L. Petersson, and R. Hartley (2019) CullNet: calibrated and pose aware confidence scores for object pose estimation. In

    2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)

    Vol. , pp. 2758–2766. Cited by: §I.
  • [3] C. Wang, D. Xu, Y. Zhu, R. Martín-Martín, C. Lu, L. Fei-Fei, and S. Savarese (2019) DenseFusion: 6d object pose estimation by iterative dense fusion. In

    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Vol. , pp. 3338–3347. Cited by: §I.