Despite recent remarkable successes in robotic grasping, most works on grasp synthesis assume either implicitly or explicitly rigid objects [mahler2017dex, morrison2018closing, jens19]. Rigid objects simplify grasp planning to the choice of contact points along the object surface, but the assumption does not hold for many real objects. However, planning grasps on non-rigid objects is difficult because objects deform under interaction forces meaning that the 3-D contact locations also depend on the forces exerted on the object. Most existing works on planning grasps on deformable objects aim to minimize the deformation [xu_minimalwork, pan_minimizedeform, delgado_minimize], while some works [lin_feel3d] actually take advantage of the deformation. Nevertheless, how object stiffness affects grasping and how to use the object deformation to generate grasps remains an open question. In this context, it is important to note that grasps generated by methods assuming rigid objects do not necessarily translate well to deformable objects and vice versa. Therefore, there is a need to study how to generate grasps that harness the target objects’ stiffness.
Of late, deep learning is the major driving force behind the progress in rigid object grasping. Many of these techniques share a similar pipeline, where a deep neural network is trained on a real or synthetic dataset to generate and evaluate grasp candidates given an input image. Rigid body simulators such as GraspIt![graspit] and OpenGRASP [opengrasp] have been used to generate thousands of grasp candidates to serve as training data for those methods. The large amount of training data helped those methods to achieve remarkable successes in terms of grasp success rate on rigid objects. Recently, to study how to manipulate cloth and rope-type objects, simulators such as PyBullet [pybullet] and MuJoCo [mujoco] have been used [wu_mujoco, yan_mujoco, jan_pybullet]. However, the use of these simulators for 3D solid deformable objects is still limited.
To address the aforementioned open issues, we envision an approach that generates grasps on a wider range of objects with varying stiffness by incorporating stiffness as an additional input to a state-of-the-art deep grasp planning pipeline (Fig. LABEL:fig:pipeline). Our system generates grasp candidates and grasp qualities for every pixel given an input depth image and stiffness image. When combined with depth information, the model outputs can be reprojected into 3D space, allowing a robot to execute a generated grasp in the real world.
The approach is evaluated in simulation and shows an improvement in terms of grasp success rate for a wide range of objects with various shapes and varying stiffness. The approach is able to generate different grasping strategies for different stiffness values such as pinching for soft objects and caging for hard objects even though no pinch grasps were included in the training data.
Ii Grasp generation using physics-based simulation
Ii-a Simulation platform choice
Simulating dynamics of deformable objects relies heavily on their geometric representations. yin_survey presents three primary deformable object modelling approaches, Mass-spring system (MSS), Position-based dynamics (PBD), and Finite element method (FEM), and their limitations. In this work, we use FEM because it is often used to model 3D objects such as food or tissues and, compared to other modeling approaches, offers a more physically accurate representation of a deformable object in a continuous domain.
Most robotic simulators do not support FEM except NVIDIA’s recent version of the Isaac Gym simulator [isaacgym], which supports soft body simulation through the NVIDIA Flex backend. Similar to SOFA [sofa], Isaac Gym includes co-rotational linear model for precision in modeling and simulating the object deformation under interaction. Furthermore, the Isaac simulator also provides the capability to integrate robot-related functions, making it easier to build robotic applications. NVIDIA also provides a grasping framework [grasp_framework] to automatically perform and evaluate grasp tests on an arbitrary target object. We use this framework in our work to generate training data and test grasps.
Ii-B Grasp generation network
To take object stiffness into account for generating grasps, we propose to use the Deep Neural Network (DNN) (Fig. LABEL:fig:pipeline). The network is inspired by [morrison2018closing]
but modified to take a stiffness image as an additional input channel. Each pixel in the stiffness image represents object stiffness. The proposed network is trained with supervised learning on a synthetic dataset. We generated our own dataset containing labeled grasps on soft and rigid objects using Isaac Gym as no such dataset existed from before.
Ii-C Training data generation
Depth and stiffness input We captured depth images of target objects with a virtual camera set to view the scene from top-down. To model variable object stiffness, four values of Young’s modulus from to were used. The Young’s modulus is normalized to [0,1] range and the corresponding stiffness value is assigned to every pixel in the stiffness image that the object occupies.
Grasp candidates Grasps are sampled with an antipodal grasp sampler to obtain approximately 200 grasp candidates for each target object. All grasp candidates that collide with the mesh are filtered out, a process that keeps about 25-40 grasps per object. The grasps are executed and evaluated using a Franka Emika Panda model in Isaac Gym. Positive grasps are then represented as rectangles in 2D image plane as shown in Fig. LABEL:fig:grasprep.
Quality metrics None of the standard grasp quality metrics are applicable for both rigid and deformable objects. As a quality metric we use a shake task which measures how easily an object is displaced in hand under various accelerations. The metric is provided by the Isaac Gym framework. A higher metric indicates that a grasp is better because it withstand higher accelerations.
Training dataset As a training dataset, we use a total of 30 objects on which we generate and label grasp candidates. The objects include 13 primitive objects provided in Isaac Gym, 5 objects from YCB dataset, and 12 objects with adversarial geometry from the EGAD! dataset [egad]. With the varying stiffness, the training set contains a total of 120 objects. To counteract the small size of the training set, we further augment the dataset with random crops, zooms, and rotations to create a set of 5400 depth and stiffness images and 27000 labeled grasps map images.
We evaluated the quality of the proposed grasp generation in simulation on objects with varying stiffness. We tested the approach on 7 common objects shown in Fig. LABEL:fig:testobj. We evaluated the top-5 generated grasps using the shake test on each object for each of the four stiffnesses, resulting in 20 grasps per object. To demonstrate the importance of stiffness input, we compared the generated grasps against grasps generated with a similar approach without stiffness information.