Learning better generative models for dexterous, single-view grasping of novel objects

07/13/2019 ∙ by Marek Kopicki, et al. ∙ 4

This paper concerns the problem of how to learn to grasp dexterously, so as to be able to then grasp novel objects seen only from a single view-point. Recently, progress has been made in data-efficient learning of generative grasp models which transfer well to novel objects. These generative grasp models are learned from demonstration (LfD). One weakness is that, as this paper shall show, grasp transfer under challenging single view conditions is unreliable. Second, the number of generative model elements rises linearly in the number of training examples. This, in turn, limits the potential of these generative models for generalisation and continual improvement. In this paper, it is shown how to address these problems. Several technical contributions are made: (i) a view-based model of a grasp; (ii) a method for combining and compressing multiple grasp models; (iii) a new way of evaluating contacts that is used both to generate and to score grasps. These, together, improve both grasp performance and reduce the number of models learned for grasp transfer. These advances, in turn, also allow the introduction of autonomous training, in which the robot learns from self-generated grasps. Evaluation on a challenging test set shows that, with innovations (i)-(iii) deployed, grasp transfer success rises from 55.1 These differences are statistically significant. In total, across all experiments, 539 test grasps were executed on real objects.



There are no comments yet.


page 2

page 4

page 6

page 12

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Dexterous grasping of novel objects is an active research area. The scenario considered here is that in which a novel object must be dexterously grasped, after being seen from just a single viewpoint. We refer to this scenario as dexterous, single-view grasping of novel objects. This is essentially the grasping problem solved by humans and establishes a high bar for robots.

The combination of scenario features means that it is hard to apply planning methods based on analytic mechanics. This is because they require knowledge of frictional coefficients, object mass, and complete object shape to evaluate a proposed grasp. None of these are either known a priori nor easily recovered from a single view.

Alternatively, there are learning methods. Broadly speaking, these divide into those that learn generative models and those that learn evaluative models

. Generative models take sensor data as input and produce one or more candidate grasps. Evaluative models take sensor data and a grasp candidate as input and produce an estimate of the quality of that grasp on the target object. In this paper, we consider how to improve generative model learning.

This paper builds on recent work on learning grasps from demonstration (LfD). There are now LfD methods for learning probabilistic generative models of grasps. Specifically, the baseline method that this paper builds on learns such a model from a small number of examples, and can then use it to generate dexterous grasps for novel objects. One drawback is that, while this baseline method can work for many single-view grasps, it is not reliable on challenging cases. Such cases include objects placed so that the surface recovery is limited, and where grasping requires contact to be made on a hidden back surface, as shown in Figure 1. In addition, there are limits to its ability to take advantage of an increasing quantity of training data, since the approach is a purely memory-based learner and has no ability to combine models from different training examples.

Figure 1: Left: a memory based generative model learner fails to grasp a beaker. Right: the new approach described here succeeds on the same case.

This paper shows how to make single-view grasping more reliable. This involves several innovations. First, (Innovation 1) we show how to learn multiple, view-specific grasp models from a single example grasp. These view-specific models enable grasps to be generated that compensate for missing back surfaces of deep objects, a typical occurrence in single-view grasping (Figure 1). Second, (Innovation 2) we move beyond memory-based models by showing how to combine information from multiple training grasps into a smaller number of generative models. This compression leads to an improvement in model generalisation and inferential efficiency on test objects. Third, (Innovation 3) we present a novel way to calculate the likelihood of finger-object contacts in a candidate grasp. This new likelihood function is used both to generate and evaluate candidate grasps. Together, these three innovations improve the performance of dexterous, single-view grasping from 55.1% to 81.6% on a test set of novel objects placed in difficult poses. Finally, we show how learning can be scaled by using self-generated grasps on test objects as further training data. This raises the grasp success rate to 87.8%.

Given these innovations, the paper tests the following hypotheses.

  • Even without an enlarged set of training grasps, the combined innovations 1-3 improve the grasp success rate.

  • View-based grasp modelling enables better generation of grasps for thick objects.

  • The grasp success rate improves progressively as innovations are added.

  • With all innovations the grasp success rate improves as the training data increases.

  • With all innovations, learning is better able than the baseline algorithm to exploit an increased amount of training data.

  • With all innovations the algorithm dominates the baseline algorithm without any innovations.

The paper is structured as follows. First, related work is described. Second, the approach is described in detail. This begins with a description of the basic framework for probabilistic generative grasp learning. It then proceeds to a description of the new learning algorithm, notably the view-based model representation and the technique for generative model compression. Following this, the new contact likelihood function and its uses are described. Finally, the results of an empirical study are presented.

2 Related work

2.1 Overview

We identify four broad approaches to grasp planning. First, there are those that use analytic mechanics to evaluate grasp quality. Second, there are methods that engineer a mapping from sensor data to a candidate grasp or grasps. Third, there are methods that learn this mapping. Finally, there are methods that instead learn a mapping from sensor data and a candidate grasp to a prediction of grasp quality or grasp success probability. To place our work in context, we review the properties of each of these, plus relevant methods in grasp execution. We cannot do justice to the entire literature, but sketch the main developments. Recent surveys of grasping include those by

Bohg et al. (2014) on data-driven grasping and Sahbani et al. (2012) on analytic methods.

2.2 Analytic approaches

Analytic approaches to grasping use the laws of mechanics to predict the outcome of a grasp (Bicchi and Kumar, 2000; Liu, 2000; Pollard, 2004; Miller and Allen, 2004). These analyses require a model of the target object’s shape, mass, mass distribution, and surface friction. They also need a model of the gripper kinematics and the exertable contact forces in different configurations. Obtaining these permits computation of the resistable external wrenches for a grasp. Based on this, a number of so-called grasp quality metrics can be defined (Ferrari and Canny, 1992; Roa and Suarez, 2015; Shimoga, 1996).

Figure 2: The objects used for training (left) and testing (right). A critical aspect of the actual training and testing sets is the object pose relative to a fixed initial camera location.

Analytic approaches have been successfully applied to find grasps for multi-fingered hands (Boutselis et al., 2014; Gori et al., 2014; Hang et al., 2014; Rosales et al., 2012; Saut and Sidobre, 2012; Ciocarlie and Allen, 2009). All of these essentially pose grasp generation as optimisation against a grasp-quality metric. The appeal is that analytic methods are interpretable and scrupulous, but there are drawbacks. Estimation of the object’s properties is challenging. One solution is to build a library of objects that might be grasped, matching partially reconstructed novel objects to similar ones in the library (Goldfeder and Allen, 2011). Alternatively, parameters of the target object such as mass (Zheng and Qian, 2005; Shapiro et al., 2004) or friction (Rosales et al., 2014) must be recovered on the fly using vision and touch. One approach is for a complete object model to be recovered from a partial view (e.g. a single depth image) by assuming shape symmetries  (Bohg et al., 2011), by using a 3D CNN (Varley et al., 2017), or by a hierarchical shape approximation (Huebner et al., 2008). The complete object can then be fed as input to an engine such as GraspIt. Search for a grasp can be improved by employing low-dimensional hand pose representations(Ciocarlie and Allen, 2009). There are, however, several assumptions underpinning many analytic methods, such as hard contacts with a fixed contact area (Bicchi and Kumar, 2000) and static friction (Shimoga, 1996). There are methods that extend modelling to, for example, soft contacts (Ciocarlie et al., 2005). A more fundamental problem is that even a small error in the estimated shape, mass or friction can render an apparently good grasp unstable (Zheng and Qian, 2005). This can be mitigated to an extent by using independent contact regions (ICRs) (Ponce and Faverjon, 1995; Rusu et al., 2009b). Despite this, there is some evidence grasp quality metrics based on mechanics are not strongly indicative of real-world grasp success (Bekiroglu et al., 2011a; Kim et al., 2013; Goins et al., 2014). In addition, such metrics can be costly to compute many times during a grasp search procedure. Analytic models are quite general, however, and so can be used for tasks such as planning grasps in clutter (Dogar et al., 2012).

2.3 Engineered mappings from sensor to grasp

Difficulties with analytic methods led to investigation of vision-based grasp planning. These methods use RGB or depth images, or representations like point clouds or meshes. Grasp generation becomes a search for object shapes that fit the robot’s gripper (Popoović et al., 2010; Trobina and Leonardis, 1995; Klingbeil et al., 2011; Richtsfeld and Zillich, 2008; Kootstra et al., 2012; ten Pas and Platt, 2014). This includes finding parallel object edges in intensity images (Popoović et al., 2010) or planar sections in range images (Klingbeil et al., 2011). Potential grasps can be also found by matching curved patches in a point cloud that can support contacts (Kanoulas et al., 2017). These candidate grasps are then refined using their shape and pose properties. The rule based approach works well for pinch gripping, but does not scale well to dexterous grasping because of the increased size of the search space. Visual servoing can be used to improve grasp reliability (Kragic and Christensen, 2003)

. Partially known shapes can be grasped using heuristics both for grasp generation and for the reactive finger closing strategy used to execute the grasp under tactile sensing 

(Hsiao et al., 2010). Such reactive strategies can also be derived for pose and or shape uncertainty automatically in a decision theoretic framework (Hsiao et al., 2011; Arruda et al., 2016). These reactive strategies can include push manipulation to make a good grasp more likely (Dogar and Srinivasa, 2010). Finally, the grasp itself may also be formulated by taking uncertainty into account as a constraint in the planning process (Li et al., 2016).

Figure 3: The structure of grasp training and testing in four stages. Stage 1: an example grasp is shown kinesthetically. Multiple contact models (one for each hand-link) and a hand configuration model are learned. Stage 2: when a new object is presented a partial point cloud model is constructed and combined with each contact model to form a set of query densities. Stage 3: many grasps are generated, each by selecting a hand-link, sampling a link pose on the new object from the query density and sampling a hand configuration. Stage 4: grasp optimisation maximises grasp likelihood. This stage is repeated until convergence.

2.4 Learning a mapping from sensor to grasp

The next wave of grasp generation methods learned this mapping from data to grasp instead. Most of these methods learn relations between features extracted from the object representation, such as SIFT or other features 

(Saxena et al., 2008; Fischinger and Vincze, 2012), shape primitives (Platt et al., 2006), box-decompositions (Huebner and Kragic, 2008) or object parts (Kroemer et al., 2012; Detry et al., 2013). The grasp itself can be parametrised as a grasp position (Saxena et al., 2008), gripper pose (Herzog et al., 2014) or a set of contact points (Ben Amor et al., 2012; Bohg and Kragic, 2010). Some methods learn grasps from demonstration (Ekvall and Kragic, 2004; Hillenbrand and Roa, 2012; Kopicki et al., 2014, 2015; Hsiao and Lozano-Perez, 2006), and in the case of Kopicki et al. (2015) create a generative model able to generate many grasp candidates for the target object. Others learn a distribution of possible grasps indexed by features from semi-autonomous grasping experiments (Detry et al., 2011)

. Recently, deep learning has been applied to learn such mappings

(Redmon and Angelova, 2015; Kumra and Kanan, 2017).

2.5 Learning a grasp evaluation function

Learning approaches have also been applied to the problem of acquiring a grasp evaluation function from data. For example grasp stability for an executed grasp can be learned (Bekiroglu et al., 2011b). An evaluation of a proposed grasp can also be learned. This problem has been tackled recently using data intensive learning methods. Most of these methods predict the grasp quality for a parallel-jaw gripper (Pinto and Gupta, 2016; Lenz et al., 2015; Johns et al., 2016; Mahler et al., 2016, 2017; Redmon and Angelova, 2015; Seita et al., 2016; Wang et al., 2016; Gualtieri et al., 2016; Levine et al., 2017). Pinto and Gupta (2016), for example, learn a function that predicts the probability of grasp success for an image patch and gripper angle. To reduce the quantity of real grasps a rigid-body simulation (Johns et al., 2016; Bousmalis et al., 2018) or synthetic dataset (Mahler et al., 2016, 2017) may be used. A synthetic data set requires that analytic grasp metrics be computed offline. Mahler et al. (2016) use a synthetic data set to predict the probability of force closure under uncertainty in object pose, gripper pose, and friction coefficient. Seita et al. (2016)

performed supervised learning of grasp quality using deep learning and random forests.

Gualtieri et al. (2016) predict whether a grasp will be good or bad using a CNN trained on depth images, using instance or category knowledge of the object to help. Hyttinen et al. (2015)

used tactile signatures fed to a trained classifier to predict object grasp stability.

2.6 Deep learning of dexterous grasping

A small number of papers have explored deep learning as a method for dexterous grasping. (Lu et al., 2017; Varley et al., 2015; Veres et al., 2017; Zhou and Hauser, 2017; Kappler et al., 2015). All of these methods use simulation to generate the set of training examples for learning. Kappler et al. (2015) showed the ability of a CNN to predict grasp quality for multi-fingered grasps, but uses complete point clouds as object models and only varies the wrist pose for the pre-grasp position, leaving the finger configurations the same. Varley et al. (2015) and later Zhou and Hauser (2017) went beyond this, each being able to vary the hand pre-shape, and predicting from a single image of the scene. Each of these posed search for the grasp as a pure optimisation problem (using simulated annealing or quasi-Newton methods) on the output of the CNN. They all, also, take the approach of learning an evaluative model, and generate candidates for evaluation uninfluenced by prior knowledge. Veres et al. (2017), in contrast, learn a deep generative model. Finally, Lu et al. (2017) learn an evaluative model, and then, given an input image, optimise the inputs that describe the wrist pose and hand pre-shape to this model via gradient descent, but do not learn an evaluative model. In addition, the grasps start with a heuristic grasp that is then varied within a limited envelope. Of the papers on dexterous grasp learning with deep networks only those by Varley et al. (2015) and Lu et al. (2017) have been tested on real objects, with eight and five test objects each, producing success rates of 75% and 84% respectively.

2.7 Relation of this work to the literature

The main similarities and differences between this work and previous methods reported in the literature may be summarised as follows. First, our method falls within the category of learning a mapping from sensory input to grasp. Thus, it differs from methods that learn an evaluation function for a proposed grasp. Second, like (Kopicki et al., 2015) it learns a generative model, and is thus able to generate many candidate grasps for a new object, rather than just one. One particular property of the method built upon (Kopicki et al., 2015) is that it can learn from a very small number of demonstrations. There are two main drawbacks of that previous work. First, it is not sufficiently robust when grasping a novel object from a single view. Second, it is purely memory based, so it cannot merge learned models, and so doesn’t extract the best models from an increasing amount data. In this paper, these drawbacks are addressed.

3 Generative Grasp Modelling Basics

The general approach is one of Learning from Demonstration (LfD). We first sketch the general structure of learning a generative grasp model from demonstration (Kopicki et al., 2015). This structure is the same across both the algorithm described in Kopicki et al. (2015) and the algorithm described here. For simplicity, we refer to the algorithm from Kopicki et al. (2015) as the vanilla algorithm. Throughout we assume that the robot’s hand comprises rigid links: a palm, and several phalanges per finger. We denote the set of hand-links . 111For clarity we refer throughout to hand-links.

First, an example grasp is presented, and then a model of that grasp is learned. This grasp model comprises two different types of sub-model (Figure 3: Stage 1). Both of these types of model are probability densities. The first type is a contact-model of the spatial relation between a hand-link and the local object shape near its point of contact. A contact-model is learned for every hand-link in contact with the object. The second type is a hand-configuration model, that captures the overall hand-shape.

Given this learned model of a grasp, new grasps can be generated and evaluated for novel objects. This grasp transfer occurs in three stages (Figure 3: Stages 2-4). First, each contact model is combined with the available point cloud model of the new object, to construct a third type of density called a query density (Stage 2). We build one query density for every contact model created during learning. Each query density is a probability density over where the hand-link will be on the new object.

Next, we generate initial candidate grasps on the new object (Stage 3) from the generative model of a grasp. Each candidate grasp is produced in three steps. First a hand-link is chosen at random. Then a pose for this hand-link on the new object is sampled. Finally, a configuration of the hand is sampled. The sampling of the hand-link pose uses the query density for that hand-link. The sampling of the hand-configuration uses the hand-configuration model. By forward kinematics these three samples (hand-link, hand-link pose and hand configuration) together determine a grasp. Many such initial grasps are generated in this way.

In the final stage each initial grasp is refined by simulated annealing (Stage 4). The goal is to improve each initially generated grasp so as to maximise its likelihood according to the generative model. The optimisation criterion is a product of experts. There is one expert for each hand-link and one expert for the hand-configuration. These experts are the query-densities and the hand-configuration model.

Generative grasp learning of this type has several desirable properties. First, it is known to be somewhat robust to partial point-cloud data for both training and test objects. Second, it displays generalisation to globally different test shapes. Third, the speed of inference is quite good if the learner is only given a small number of training grasps (2000 candidate grasps are generated and refined in sec on a modern 16-core PC). Fourth, there is evidence of robustness to variation in the orientation of the novel object.

However, as mentioned previously, there are also weaknesses in this type of generative grasp model. These are: (i) a need to further improve robustness when grasping from a single-view of the test object; (ii) a need to extract the best possible generative models as the number of training grasps grows.

Having sketched the basic structure of generative grasp learning we now detail model learning and grasp inference for our new algorithm. We start by describing the basic representations that underpin the work. As we proceed we will highlight the innovations made.

4 Representations

The method requires that we define several models: an object model (partial and acquired from sensing); a model of the contact between a hand-link and the object; and a model of the whole hand configuration.

Since all of these models are probability densities, underpinning all of them is a density representation. We first describe the kernel density representation we employ. Then we describe the various local surface descriptors we may use as the basis for the contact and query density models. We follow this with a description of each model type.

4.1 Kernel Density Estimation

denotes the group of rotations in three dimensions. A feature belongs to the space , where is the group of 3D poses, and surface descriptors are composed of

real numbers. We extensively use probability density functions (PDFs) defined on

. We represent these PDFs non-parametrically with a set of features (or particles)


The probability density in a region is determined by the local density of the particles in that region. The underlying PDF is created through kernel density estimation (Silverman, 1986), by assigning a kernel function to each particle supporting the density, as


where is the kernel bandwidth and is a weight associated with such that . We use a kernel that factorises into three functions defined by the separation of into for position, a quaternion for orientation, and for the surface descriptor. Furthermore, let us define and :


We define our kernel as


where is the kernel mean point, is the kernel bandwidth, is an -variate isotropic Gaussian kernel, and corresponds to a pair of antipodal von Mises-Fisher distributions which form a Gaussian-like distribution on Fisher (1953); Sudderth (2006). It is the natural equivalent of the Gaussian for circular variables such as orientation. The value of is given by


where is a normalising constant, and denotes the quaternion dot product.

Using this representation, conditional and marginal probabilities can easily be computed from Eq. (2). The marginal density is computed as


where . The conditional density is given by


4.2 Surface Descriptors

To condition the contact models it is necessary to have some descriptor of the local surface properties in the region of the contact. In principle these descriptors could capture any property, including local curvatures, surface smoothness, etcetera, that may influence the finger pose. In this paper we consider two different surface descriptors based solely on point-cloud data: local curvatures and fast point feature histograms (FPFH). We briefly describe the former. The latter is described in Rusu et al. (2009a).

The principal curvatures are the surface descriptor used in the vanilla algorithm. To create the descriptor, all the points in the point cloud are augmented with a frame of reference and a local curvature descriptor. For compactness, we also denote the pose of a feature as . As a result,


The surface normal at is computed from the nearest neighbours of using a PCA-based method (e.g. Kanatani (2005)). The surface descriptors are the local principal curvatures, as described in Spivak (1999). Their directions are denoted , and the curvatures along and form a

-dimensional feature vector

. The surface normal and the principal directions define the orientation of a frame that is associated with the point .

Figure 4: A two fingered grasp of an object, shown in cross-section. Left: the vanilla algorithm incorporates point-cloud data from all views. Centre and right: the new algorithm learns a separate model for each view.

4.3 Object View Model

The new method proposed in this article requires a set of view-based models of each training object. Let there be several object-grasp training examples . Let each of these examples be observed from views . A model of the grasped object from this view, denoted , is computed from a single point cloud captured by a depth sensor as a set of features

. This set of features defines, in turn, a joint probability distribution, which we call the

object-view model:


where is short for , , is defined in Eq. (4) with bandwidth , and where all weights are equal, . In summary, this object-view model represents the object surface as a pdf over surface points and descriptors.

5 Contact Learning

Having set up the basic representations, we now describe the learning procedure. This includes the new view-based grasp model, and the procedure to merge contact-models learned from different grasp examples. This section corresponds to the left branch of Stage 1 of Figure 3.


Figure 5: Top: in the vanilla model there is one model per grasp example. Bottom: in this paper there is one grasp model per view. Each view-based model contains contact models for the contacts that fall within the view and a copy of the hand-shape model.

In Kopicki et al. (2015) the representation required that all views of the training object were registered into a single point cloud (Figure 4 left). A model of the grasp was learned from this registered point cloud. Instead, in this paper, a separate grasp-model is learned for each view (Figure 4 centre and right). Each view-based grasp model contains both a model of the hand shape—thereby modelling all the hand-link positions relative to one another—and a model of each of the hand-object contacts that can be seen in that view. Thus each view-based grasp model excludes contact information for contacts it cannot see.

This means that the grasp models are organised by view. (Figure 5). The purpose is that the learned models more closely reflect the partial information available to the robot when grasping an object from a single available view. At inference time it means that the grasp optimisation procedure will not try to force all hand-links which were in contact in the training grasp into contacts with visible surfaces, instead relying on the hand shape model to implicitly guide hand-links to hidden back surfaces.

5.2 Contact Receptive Field

Having defined the structure of the view-based grasp model, we now define the contact models that form part of it. This involves defining a receptive field around a contact, which determines how important different points on the object surface are in the contact model.

Figure 6: The contact receptive field associated with the hand-link (solid yellow block) with link pose . The black dots are samples from the surface of an object. The distance between feature and the closest point on the link’s surface is shown. The rounded rectangle illustrates the cut-off distance . The poses and are expressed in the world frame . The arrow is the pose of relative to the frame for the surface feature .

The contact receptive field is a region of space relative to the associated hand-link (see Fig. 6) which specifies the neighbourhood of that link. The contact receptive field is realised as a function of surface feature pose :


the value of which determines the relevance of a particular surface feature to a given hand-link in terms of the likelihood of the physical contact. We use contact receptive fields which are family of parameterised functions for which the value falls off quickly with the distance to the link:


where and is the point on the surface of that is closest to . This means that the contact receptive field will only take account of the local shape, while falling off smoothly. A variety of monotonic, fast declining functions could be used instead.

5.3 Contact Model Density

Now we have defined the receptive field we can define the contact model itself. We denote by the pose of link relative to the pose of the -th object feature. In other words, is defined as


where denotes the pose of , denotes the pose composition operator, and is the inverse of , with (see Fig. 6).

Contact model encodes the joint probability distribution of object surface features for the -th hand-link, -th view and -th object-grasp example:


where is short for ,

is the random variable modelling object surface features of grasp-view example

, and models the pose of relative to the frame of reference defined for the feature. In other words, denoting realisations of and by and , gives us the probability of finding at pose relative to the frame of a nearby object surface patch exhibiting surface descriptor .

The contact model for link , view and object is estimated as:


where is a normalising constant, , , is the number of features in the point cloud, and is kernel function (4) defined at poses from Eq. (12).

We now also introduce, for the first time, the idea of a contact model norm, which estimates the extent of the likely area of a physical contact of hand-link with surface features visible from view of grasp-object pair :


We use this norm to help estimate which links are reliably involved in an grasp.

5.4 Contact model selection

A view-based learning framework has the consequence that not all learned models are useful. In any grasp-view pair some hand-links may make poor contacts with the observed parts of the object. This can also simply be caused by the relevant contact not being visible from a particular viewpoint. In both cases we must determine which contact-models should be created and which ignored. This is the purpose of lines 1-13 of Algorithm 1.

The contact model selection procedure determines, for a given grasp example , view and hand-link , whether the contact model should be created. It proceeds in two phases. The first phase uses the contact model norm (15) to prune out unreliable grasps (Algorithm 1

, lines 1-3). The decisions are recorded in a set of binary variables termed

contact hypotheses, :


where the contact-model is retained if , is the threshold, is the number of hand links, is the number of views of grasp example , and is the number of grasp examples.

Having pruned out unreliable contact models from each grasp-view pair the second phase prunes out unreliable grasp-view examples (Algorithm 1, lines 4-6). A grasp-view example is retained if the total number of non-empty contact models for a particular view and grasp , determined by , is higher than some minimum number . This is encoded in a set of binary variables termed view hypotheses, :


After a number of example grasps we will thus obtain a set of non-empty contact models (Algorithm 1, lines 7-13). We may index these using a triplet of indices corresponding to the hand-link, view and grasp example. Because of contact-model pruning not all will have a contact-model, i.e. for some views, links or grasps the contact model will be empty. We denote the set of indices for the non-empty contact models . The size of this set is .


The parameters , , and , , were chosen empirically. The time complexity for learning each contact model is where is the number of triangles in the tri-mesh describing hand link , and is the number of features of view and example grasp .

5.5 Clustering Contact Models

So far, we have defined how a contact model is learned. Using this memory based scheme, the number of contact models will grow linearly with the number of training examples. In a memory based learner, every contact model must be transferred to the target object. This may also limit the generalisation power of the contact models. This paper presents an alternative to memory based learning. We may exploit a growing number of training examples by merging contact models. This is the purpose of lines 14-22 of Algorithm 1. We hypothesize that this merging process will result in higher grasp success rates at transfer time. To merge models we first cluster the contact models according to similarity. Since each contact model is a kernel density estimator, the key step is to define an appropriate similarity measure between any pair of such estimators. Our principal aim is to produce a distance that is fast to compute and which is robust to the natural variations in the underlying data in the grasping domain. We define, first, an asymmetric divergence and then, on top of it, a symmetric distance. Since we are using kernel density estimators we can most simply define a distance between the sets of kernel centres.

First we define a distance between two kernels lying in , and (see (3a)), as a weighted linear combination of sub-distance measures:


where are weights, and


The operator extracts the surface normal, and denotes a dot product. Using this, we define the distance of kernel from kernel density as


This distance considers only one kernel that is the nearest to . This has two major advantages. First, it allows the use of fast nearest neighbour search techniques with time complexity rather than , where is the number of kernels in . Second, the distance given by (21) is independent of the remaining kernels in the density 222This is useful in our domain since is constructed from a depth image taken from an RGBD camera. The density of points underpinning varies with the object-camera distance. Using a distance between and that depends only on the closest kernel in renders the distance much less sensitive to variations in the density of . This improves generalisation. Additionally, for further efficiency, the distance measure  (21) ignores kernel weights. This approach is valid since all weights are computed using (11). Thus, each weight depends only on the kernel position relative to the local frame of the relevant hand-link, which is already accounted for in the linear distance (20a).

Next, we use this distance to define the divergence of kernel density from kernel density


where is the number of kernels of . Note that this divergence (22) is asymmetric. For example may be large, while , if is constructed from by removing some large “surface patch”.

1:  for all grasps , views , links  do
2:     compute using Eq. (16)
3:  end for
4:  for all grasps , views  do
5:     compute using Eq. (17)
6:  end for
7:  Set
8:  for all grasps , views , links  do
9:     if  and  then
10:        add index triplet
12:     end if
13:  end for
14:  Set to be an matrix
15:  for all pairs of triplets such that  do
16:     compute distance using Eq. (22)
17:  end for
18:   affinity-propagation()
19:  for all clusters  do
20:     compute using Eq. (24)
21:  end for
22:  return  
Algorithm 1 Contact model selection and clustering

To cluster the contact models, however, we require a symmetric distance, which we define as:


It is worth noting some other benefits of this distance definition within our domain. First, (21) ignores surface descriptors. This is both because they can be high dimensional and because the shape properties they encode are already encoded in the remaining kernels of the contact models, and so are captured in (22). Thus, measuring the distance with respect to the surface descriptor adds no benefit. For the same reason, we do not compute the distances between pairs of local frames. Instead, we simply compare pairs of surface normals.

This distance is calculated for every pair of contact models (Algorithm 1, lines 14-17). The next stage is to cluster contact models using the metric defined by (23). Many clustering procedures could be used. We used affinity propagation (Frey and Dueck, 2007), which requires computation of all pair-wise distances (Algorithm 1, line 18). Affinity propagation finds clusters together with cluster exemplars—the most representative cluster members. Clustering creates a partition of the set of contact models . Where is the number of clusters and denotes the -th cluster, which is a set with members. There is a one-to-one map from each contact model index onto its corresponding cluster and index within that cluster, . Thus, we can write out cluster as . Note that, when referring to a contact model as a member of a cluster, we use the superscript for clarity as to the meaning of the index in the subscript.

Figure 7: The prototypes produced for four of the clusters after affinity propagation. Each picture shows the density over rigid body transformations for the surface relative to the frame (shown) attached to the finger link. Only the positions of the kernel centres (black dots) are shown. Surface descriptors and orientations have been marginalised out.

In order to boost the generalisation capability of the clustered contact models (in particular for small clusters), we create cluster prototypes, denoted , to replace cluster exemplars. We first define a multinomial distribution, for each cluster , over the members of that cluster, . This lets us define, in turn, a cluster prototype contact model as a mixture model:


We evaluate this mixture with a simple Monte Carlo sampler. The probabilities are obtained using the density distance (23) between a cluster member with index and the cluster exemplar:


where controls the spread of the probability density around the cluster exemplar. The are the normalised versions of the weights . A cluster prototype is calculated for each cluster (Algorithm 1, lines 19-21).

We can visualise the resulting prototypes (Figure 7) and the cluster members (Figure 8). It can be seen that the clusters are coherent and well separated. This corresponds to the fact that, in terms of the link-to-surface relations, there are a distinct number of contact types.

Figure 8: The cluster prototype (black dots) and three cluster members (red dots) for one of the clusters. This shows how the densities fall into relatively easily clustered types, showing that contact model merging via clustering does not lead to significant information loss.

Figure 9: Visualisation of contact model clusters formed.

The parameters , and were chosen empirically. The time complexity for computing all contact model pairwise distances is where is the number of contact models after the selection procedure has been applied, and where is the average number of kernels in a contact model. The time complexity of the clustering algorithm is where is the number of iterations of the affinity propagation algorithm Frey and Dueck (2007).

5.6 Hand Configuration Model

The hand configuration model , for a grasp , was originally introduced in Kopicki et al. (2015) and remains the same here. It is thus described for completeness. It encodes a set of configurations of the hand joints (i.e., joint angles), that are particular to a grasp example . The purpose of this model is to allow us to restrict the grasp search space, during grasp transfer, to be close to hand configurations in the training grasp. Learning this model is the right hand branch of Stage 1 in Figure 3.

The hand configuration model encodes the hand configuration that was observed when grasping the training object, but also a set of configurations recorded during the approach towards the object. We denote by the joint angles at some small distance before the hand reached the training object, and by

the hand joint angles at the time when the hand made contact with the training object. We consider a set of configurations interpolated between

and , and extrapolated beyond , as


where and is a regularly spaced set of values from the real-line. For all , configurations are beyond . A hand configuration model is constructed by applying kernel density estimation


where and . and were hand tuned and kept fixed in all the experiments. One hand-configuration model is learned for each example grasp. The complexity of learning a hand-configuration model is , where is the number of example grasps.

Having completed the description of the learning procedure (Stage 1 in Figure 3) we turn to describing our novel grasp inference procedures (Stages 2-4).

6 Grasp Inference

The inference of grasps on a new object relies on three procedures: (i) a procedure of transferring contact models to the new object; (ii) a grasp generation procedure and (iii) a grasp optimisation procedure. These correspond to Stages 2, 3 and 4 of Figure 3 respectively.

In this paper the novel contribution to grasp inference is that we modify the procedure for transferring contact models so as to improve the quality of the proposed grasps. This is achieved by incorporating the density-divergence measure introduced earlier in the paper.

6.1 Contact Query Density Computation

A query density is, for a particular hand-link and an object model, a density over the pose of that hand-link relative to the object. Query densities are used both to generate and to evaluate the likelihood of candidate grasps. Intuitively, the query density encourages a finger link to make contact with the object at locations with similar local surface properties to those in the training example. A query-density is simply the result of convolving two densities: a contact model density and the object model density. This section describes the formation of a query-density (Figure 3 Stage 2). The main innovation is that we present a new likelihood function for generating and evaluating finger contacts with the object. If we were to directly adopt the previous approach of Kopicki et al. (2015) the query density would be defined as:


where is the test object-view model (9). As described previously, this is a joint density over frames in global workspace coordinates and over surface descriptors . The term is the Dirac delta function, since is determined by . The relevant contact model is factored into the product . Algorithmically, this density is approximated using importance sampling.

In this paper, the query density is re-defined. Specifically, the term is replaced. This term defines a density over the test-object’s surface shape around , according to the contact model. But is only a low-dimensional summary of the ideal surface shape. To avoid the resulting loss of information we may, instead of , use a conditional density over the precise surface shape in the neighbourhood of .


where is the surface patch, on the test object, in the neighbourhood of .

Substituting this for gives us a new query density definition:


We desire that the more alike the test surface is to the training surface the higher should be. The density divergence defined earlier is ideal for this purpose:


where is a constant and we define as a composition of a transform and a set. First, recall that and that . Then, for every pair in , we simply compose and


So that, when we extend this to , we obtain


which performs a rigid body transform on the density over surface shape, defined by the contact model, so as to map it onto the test object’s actual surface around . To obtain the divergence of the relevant surface patch density from the transformed contact model density is defined by Eq.(31) and thus by Eq. (22).

1:  for all samples to  do
2:     sample
3:     sample from conditional density
4:     set
5:     set
6:     separate into position and quaternion
7:  end for
8:  normalise weights such that
9:  return  
Algorithm 2 Query density formation

As mentioned above, the query density is approximated using importance sampling. When a test object-view model, , is presented a set of query densities is calculated, one for each contact model prototype , , according to (34). The algorithm proceeds as follows. Each consists of kernels centred on weighted hand-link poses:


with -th kernel centre , and gives the weight


The sampling procedure is detailed in Algorithm 2. First, a joint sample is taken from (line 2), then is sampled from (line 3). This completely specifies a possible hand-link pose and curvature combination (line 4). Then the importance weight is calculated (line 5). The weights are normalised before the set of kernels is returned (lines 8-9).

The parameter was chosen empirically. The time complexity for computing contact query density is , where is the number of contact query kernels, is the number of contact model kernels and is the number of test object view kernels.

6.2 Grasp Generation

Once query densities have been created for the new object for each contact model prototype, an initial set of grasps is generated (Figure 3, Stage 3). Generation is by a series of random samples. We randomly sample, a grasp-view combination and then a hand-link . This triple points to a contact-model cluster and hence to a query density . A link pose is then sampled from that query density. Then a hand configuration is sampled from . Together, the seed pose and the hand configuration define a complete grasp , via forward kinematics, including the wrist pose . This is an initial ‘seed’ grasp, which will subsequently be refined. A large set of such initial solutions is generated, where means the initial solution.

Having generated an initial solution set , stages of optimisation and selection are then interleaved to create a sequence of solution sets for .

6.3 Grasp Optimisation

The final stage of the schema is optimisation of the candidate grasps (Figure 3, Stage 4). The objective of grasp optimisation is, given a candidate equilibrium grasp and a reach to grasp model, to find a grasp that maximises the product of the likelihoods of the query densities and the hand configuration density


where is the overall grasp likelihood and is the hand configuration model (27). The query density is the query density for the cluster prototype to which hand-link is mapped. The pose for hand-link is given by the forward kinematics of the hand, . Finally, is the set of instantiated query-densities for grasp-view pair .

Thus, whereas each initial grasp is generated using only a single query density, grasp optimisation requires evaluation of the grasp against all query densities. It is only in this improvement phase that all query densities must be used. Improvement is by simulated annealing (SA) Kirkpatrick et al. (1983). The SA temperature is declined linearly from to over the steps. In each time step , one step of simulated annealing is applied to every grasp in .

Figure 10: The ten human demonstrated grasps. Top row (grasps from Kopicki et al. (2015)), from left to right: pinch with support, power-box, handle, pinch, and power-tube. Bottom row (new training grasps), from left to right: pinch-bottom, rim-side, rim, power-edge, and power-handle. Top-row grasps were used in Experiment 1. Top-row and bottom-row grasps were used in Experiments 2 and 3. The grey lines show the sequence of finger tip poses on the demonstrated approach trajectory. The whole hand configuration is recorded for this whole approach trajectory. The initial pose and configuration we refer to as the pre-grasp position. For learning the contact models and the hand configuration model only the final hand pose (the yellow hand pose) is used. The point clouds are the result of registration of seven views with a wrist mounted depth camera taken during training. The training occurs with individual views. Coloured patches show contacts by finger, rather than individual hand-link.
Figure 11: Individual views, from which the view based models are trained, for the pinch-bottom grasp on an up-turned mug.

6.4 Grasp Selection

At predetermined selection steps (here steps 1 and 50 of annealing), grasps are ranked and only the most likely retained for further optimisation. During these selection steps the criterion in (36c) is augmented with an additional expert penalising collisions for the entire reach to grasp trajectory in a soft manner. This soft collision expert has a cost that rises exponentially with the greatest degree of penetration through the object point cloud by any of the hand links. We thus refine Eq. 36c:


where is now factorised into three parts, which evaluate the collision, hand configuration and query density experts, all at a given hand pose . A final refinement of the selection criterion is due to the fact that the number of hand-links in contact during a grasp varies across grasps and views. Thus the number of query densities , also varies, and so the values of and cannot be compared directly. Given the grasp with the maximum number of involved links , we therefore normalise the likelihood value (37a) with


It is this normalised likelihood that is used to rank all the generated grasps across all the grasp-view pairs during selection steps. After simulated annealing has yielded a ranked list of optimised grasp poses, they are checked for reachability given other objects in the workspace, and unreachable poses are pruned.

6.5 Grasp Execution

The remaining best scoring hand pose , evaluated with respect to (38), is then used to generate a reach to grasp trajectory. This is the command sequence that is executed on the robot.

7 Experimental Study

This section is structured as follows. First, the creation of challenging data set is described. Second, the algorithmic variants tested are enumerated. Then, three experiments are presented in turn. Each of these varies in the size of the training set. Experiment 1 trains with five grasps and Experiment 2 with ten. Experiment 3 introduces training from robot generated grasps. A discussion examines each of the hypotheses in the light of the results.

7.1 Test set creation

In preparation for the experimental evaluation, a challenging test set was created. This used 40 novel test objects (Figure 2). The test cases were object-pose pairs relative to a single, fixed viewpoint. The object-pose combinations were chosen to be particularly challenging by using a pose that reduced the typical surface recovery from the fixed view. Some objects were employed in several poses, yielding a total of 49 object-pose pairs. Because both object and pose are controlled this means that we can test algorithms using a paired comparisons statistical methodology. This can yield statistically significant results for small numbers of test grasps.

The question arises as to whether it can be validated that this new data-set is indeed challenging. A previous single-view data set was also generated in Kopicki et al. (2015), using many of the same objects, but without deliberately challenging poses. The performance of the algorithm presented in Kopicki et al. (2015) on this original single view test set was 77.8% (35/45). Therefore, to verify the challenging nature of the new test data-set, that algorithm was tested on the new single-view dataset. To make the two test-sets comparable the same set of five training grasps, presented in Kopicki et al. (2015), was used. Testing on the new dataset the performance of the algorithm reduced to 59.2% (29/49). Since we hypothesized that the harder data set should produce a lower success rate we applied a one-tailed test, using Fisher’s exact test, which gave a p-value of 0.043. Thus the difference between the success rates is statistically significant and is unlikely to have been caused by chance. We therefore accept the hypothesis that the new data set of object-pose pairs is more challenging than the previous data set.

7.2 Algorithmic variations

Alg View-based New Eval Merging Features
A1 No No No Curv
A2 Yes No No Curv
A3 Yes Yes No Curv
A4 Yes Yes Yes Curv
A5 Yes Yes No FPFH
A6 Yes Yes Yes FPFH
Table 1: Algorithmic variations for Experiment 2

The paper has presented three main innovations. These are: (i) a view based representation; (ii) a method for merging contact models; (iii) a new evaluation method for calculating the likelihood of a generated grasp. In addition, the surface descriptor may be either principal curvatures or fast point feature histograms. It is clearly desirable to evaluate which of these innovations is most effective, and to study how well they work in combination. The sixteen possible combinations, however, are too many to evaluate properly on a real robot. Therefore, six different combinations were tried, each one introducing a new innovation on top of the others. These six ‘algorithmic variations’ are listed in Table 1. The algorithm reported in Kopicki et al. (2015) is variation A1. Note that variants A1 and A2 are not to be confused with Algorithm 1 (contact model clustering) and Algorithm 2 (query density computation), which are components of all variants.

The algorithms presented depend on a number of parameters, which have been presented earlier in the text. It would not be possible to systematically tune these parameters using grid search, but a small number of informal experiments were used to select the values used here. The same parameter settings were used for all algorithmic variants. It is entirely possible that better settings exist. The values used are presented in Table 2.

Receptive field
Contact model (curvature)
(linear), (angular)
Contact model selection
Clustering Contact Models
Hand Configuration Model
(number of kernels),
Query density
(number of kernels),
Grasp Generation
(number of initial solutions)
, selection steps are at ,
Table 2: Parameters of the grasp learning and inference algorithms.

7.3 Experiment 1

This experiment tests the hypothesis that, even without an enlarged set of training grasps, the combined innovations, present in A4 and A6, will improve the grasp success rate. Thus, variations A1, A4 and A6 were trained with the five grasps from Kopicki et al. (2015). A paired comparisons experiment with all 49 test cases was executed. This led to a grasp success rate of 59.2% (as reported above) for A1, a success rate of 75.5% (37/49) for A6, and 77.6% (38/49) for A4. Although the success rates for both A4 and A6 are higher than that for A1, using the two-tailed McNemar test for the difference between two proportions on paired data these differences are not statistically significant (p=0.1175 for A6:A1, p=0.0665 for A4:A1).

7.4 Experiment 2

This experiment tests hypotheses H2 and H3. H2 is the hypothesis that view-based grasp modelling enables better generation of grasps for thick objects. H3 is the hypothesis that the success rate progressively increases as innovations 1 to 3 are added. Experiment 2 also provides evidence to test hypotheses H4 and H5, that the grasp success rate will improve as training data is added, and do so faster if all innovations are deloyed. The training set was therefore increased to 10 grasps. Figure 2 shows the objects used for training. To test H2 and H3, six algorithmic variations A1:A6 were tested, as detailed in Table 1. These progressively add the three innovations. Algorithm A1 is the version described in Kopicki et al. (2015). A2 is A1 plus view based organisation of grasp models. Algorithm A3 is A2 plus improved evaluation of grasp likelihood by density comparison. Algorithm A4 is A3 plus contact model merging. All of algorithms A1-A4 use principal curvatures as the features. As a final step, we also test the robustness of the method to changes in the surface descriptors used. Variants A5 and A6 are the equivalents of A3 and A4 respectively, but use FPFH as the surface descriptor instead of curvatures.

The success rates for each algorithmic variation are shown in Table 3. As the innovations are added the success rate rises. Applying McNemar’s test for the difference in proportions for paired data, A4 and A6 dominate A1 and A2, and these differences are highly statistically significant. A full table of p-values is shown in Table 4.

Alg # succ % succ Alg # succ % succ
A1(5) 29 59.2 A4(5) 38 77.6
A6(5) 37 75.5 A1(10) 27 55.1
A2(10) 28 57.1 A3(10) 34 69.4
A4(10) 40 81.6 A5(10) 35 71.4
A6(10) 40 81.6 A1+AT 31 63.3
A4+AT 43 87.8
Table 3: Experiments 1, 2 and 3: Grasp success rates for algorithm variations A1 (Vanilla) to A6, and for A1+AT and A4+AT (Autonomous Training). Numbers in brackets indicate the number of training examples used.
Alg pair p-value Alg pair p-value
A4(5):A1(10) 0.0153 A4(10):A1(5) 0.0218
A6(5):A1(10) 0.0162 A6(10):A1(5) 0.0153
A4(10):A1(10) 0.0009 A4+AT:A1(10) 0.0002
A6(10):A1(10) 0.0036 A4+AT:A2(10) 0.0013
A4(10):A2(10) 0.0033 A4+AT:A3(10) 0.0265
A6(10):A2(10) 0.0033 A4+AT:A5(10) 0.0433
A4+AT:A1(5) 0.0056 A4+AT:A1+AT 0.0095
Table 4: p-values for statistically significant pairwise differences between algorithms for Experiments 1, 2 and 3. Stronger algorithms are on the left. Format is Alg(X), where X is the number of training examples, Alg+AT means autonomous training.

           A1+AT                       A4+AT                       A1+AT                       A4+AT           (Failures)                   (Successes)                   (Failures)                   (Successes)

Figure 12: Comparison of grasps executed by algorithms A1+AT and A4+AT. These are 10 of the 15 cases where A1+AT failed and A1+AT succeeded. Columns are labelled by the variant.

           A1+AT                       A4+AT                       A1+AT                       A4+AT           (Successes)                   (Failures)                   (Failures)                   (Failures)

Figure 13: Comparison of grasps executed by algorithms A1+AT and A4+AT. The funnel, lemon juice bottle and small saucepan are the 3 cases where A1+AT succeeded and A1+AT failed. The upside-down mug and the tennisball are 2 of the 3 cases where both A1+AT and A4+AT failed. Columns are labelled by the variant.

7.5 Experiment 3

This experiment provides further evidence to test hypotheses H4 and H5. Specifically, it tests what happens to the grasp success rate when the training data continues to grow. Because human demonstration is time consuming, growing the training data can be achieved by using grasps generated autonomously by the robot as additional training data. This autonomous training (AT) allows the algorithm to scale. This was implemented here using a leave-one-out training regime. The robot was trained with all successful grasps excluding the test object.333The training grasps varied from 49 and 50. We trained with 40 successful grasps from A4 in Experiment 2 and 10 demonstrated grasps. If the test grasp-object-pose had been successful in Experiment 2 it was removed from the training set hence there were 49 training examples. We used the same training set for A1+AT and A4+AT. Thus, the algorithm is trained with 10 demonstrated grasps, and up to 40 successful autonomously generated grasps from Experiment 2. The testing regime rotates the test object-pose pair through the complete set of 49 object-pose pairs, thus making it possible to conduct a paired-comparisons test against Experiment 2. This autonomous training regime was tested using the baseline variant A1 and variant A4. We refer to these variants with autonomous training as A1+AT and A4+AT respectively. Grasps are shown in the multimedia extension. Since there was no appreciable difference between A4 and A6 in Experiment 2 we did not create an additional variant for A6. For A1+AT the success rate rose to 63.3% (31) and for A4+AT the success rate was 87.8% (43/49). A two-tailed McNemar test for paired comparisons data shows that the differences between A4+AT and several other variants (A1, A2, A3, A5) are statistically significant (Table 4, Figure 14).

Figure 14: A partial order dominance diagram for Experiments 1, 2 and 3. Algorithms are banded in rows by their success rate. More successful algorithms are higher up.
Figure 15: The number of contact models before and after compression as the number of training grasps rises.

7.6 Discussion

The hypotheses can be considered in order. Hypothesis H1 (even without an enlarged set of training grasps the combined innovations 1-3 will improve the grasp success rate.) was tested in Experiment 1. There is support for this in that the grasp success rate for A1(5) is 59.2%, whereas for A4(5) and A6(5) it is 77.6% and 75.5% respectively. Although this difference is nearly 20% it is not, however, statistically significant. This result thus provides moderate support for hypothesis H1.

Hypothesis H2 (view-based grasp modelling enables better generation of grasps for thick objects) can be tested by identifying which object-pose pairs possess a deep back-surface. Such situations force the robot to generate a grasp with a wide hand-aperture, so as to place a finger on that hidden surface. These so-called thick object-pose pairs in the dataset are listed in Table 5. To test the hypothesis the pairs of grasp outcomes for algorithmic variants A1(10) and A2(10) can be compared for this subset of 16 object-pose pairs. For this subset A1(10) has a success rate of 18.75% (3/16), whereas A2(10) has a success rate of 68.75% (11/16). Using a two-tailed McNemar test the difference is statistically significant (p-value=0.0133). This provides strong evidence in support of hypothesis H2. In addition, although it presents grasps from A4+AT and A1+AT, Figure 12 shows specific instances of grasps where this ability to grasp hidden back-surfaces of thick objects can be seen, such as the coke bottle, guttering, spray can and stapler.

Object (pose) Object (pose)
Coke bottle Guttering (top)
Lemon juice Moisturiser
Mr Muscle Mug 1 (upside-down)
Mug 4 (upside-down) Mug 5 (upside-down)
Large spray can Stapler
Tennis ball Danish ham (sideways)
Table 5: Object-pose pairs with deep hidden back-surfaces.

Hypothesis H3 (the grasp success rate improves as innovations 1-3 are added) is again supported by the monotonic increase in grasp success rate as innovations 1, 2 and 3 are added in order. The success rate rises from 55.1% (A1(10)), through 57.1% (A2(10)), 69.4% (A3(10)), to 81.6% (A4(10)). Some of these differences are statistically significant. Notably, both variations A4(10) and A6(10) are better than A1(10) and A2(10) and the differences are either highly or extremely significant. This provides good support for hypothesis H3.

Hypothesis H4 (with all innovations the grasp success rate improves as the training data increases) can be tested by examining the change in success rate as data is added. Algorithm A4 was tested with the full range of training set sizes of 5, 10 and 49-50 grasps. The evidence supports H4, since the success rate rises from 77.6% (A4(5)) through 81.6% (A4(10)) to 87.8% (A4+AT).

Hypothesis H5 (with all three innovations, learning is better able than the baseline algorithm to exploit an increased amount of training data) can be tested by comparing the figures for A4 to those for A1, across all three training regimes. For A1 the corresponding success rates are 59.2%, 55.1% and 63.3%. Thus, when moving from five training examples to fifty, whereas algorithm A1 improved by 4.1%, A4 improved by 10.2%. This supports hypothesis H5. We suggest that this is because of the use of contact model merging. Figure 15 shows that the compression ratio (initial models:clusters) increases as the training set grows. This shows the effect on the number of models. The effect of this compression on the grasp success rate is shown by comparing the success rate of A4 with A3 (81.6%:69.4%), and A6 with A5 (81.6%:71.4%). Thus, when adding contact model merging the improvement is of the order of 10%. This provides support for the idea that the advantage extracted from additional data is due in part to contact model merging.

Finally, hypothesis H6 (with all innovations the algorithm dominates the baseline algorithm without any innovations) is tested by examining the results of all three experiments. Variants A4 and A6 outperform the corresponding version of A1, regardless of the training regime. Figure 14 shows that all of these differences, bar A4(5) and A6(5) versus A1(5), are statistically significant. In addition, Figure 14 shows that A4+AT dominates A1 regardless of the training regime used, including the best version (A1+AT), and these differences are either highly () or extremely () statistically significant. These results provide very strong support for H6.

We also note that there is no evident difference in performance caused by the choice of surface descriptor.

Experiment Query density Generation &
Number computation (secs) Optimisation (secs)
A1 A4 A1 A4
1 0.41 7.95 5.3 4.52
2 1.04 14.47 6.89 4.55
3 1.70 17.77 8.66 4.64
Table 6: Computation time, per test object, for algorithms A1 and A4.

Next, we show the computation times for variants A1 and A4 in Table 6. These comparisons were made on a PC with two Xeon E5-2650V2 CPU processors. This comparison shows that the new algorithms are slower in terms of query density computation. This is because of the use of the new evaluation function, which is roughly 26 times more expensive than previously. This factor, however, is constant, so that as the number of training grasps rises A1 will eventually exceed A4 in terms of computation time. The absolute time for A4 is higher than would be suitable for real-world use. The algorithm, is, however, well suited for GPU implementation, which would significantly reduce absolute computation time.

Object (pose) # succ’s from 11
Kettle 6
Large spray-can 6
Large funnel (sideways) 5
Large saucepan 5
Mug 1 (upside-down) 4
Mug 3 4
Mug 5 (upside-down) 4
Mug 4 (upside-down) 3
Small saucepan 3
Tennis ball 2
Danish ham (sideways) 0
Table 7: Most challenging object-pose pairs.

Finally, we also highlight the most challenging object-pose pairs. There were eleven grasps executed per object-pose pair across all variants and experiments. Table 7 shows the objects for which the number of successes was six or lower. It is worth noting that these objects were difficult mostly because they needed to be grasped around their body, which was very close to the maximum aperture of the dexterous hand that was used.

8 Conclusions

Dexterous grasping of novel objects given a single view is an important problem that needs to be solved if dexterous grasping is to be deployed in real-world settings. While good progress has been made on simple pinch grasping from a single view, dexterous grasping is significantly more challenging, due to the increased dimensionality of the hand, and thus the search space.

This paper has presented a number of technical innovations that, when combined, improve grasp performance. These were: view-based grasp modelling, contact-model merging, and new method for generation and evaluation of contacts. These innovations enable an increase in the number of training examples. This, in turn, enables an change to the training methodology in which the grasps generated and executed for novel objects are fed back as further training examples. An empirical evaluation of the algorithms, on a data-set of challenging grasp-view combinations, showed that there are substantial differences in grasp success rate between some variations, from 55.1% for the algorithm reported in Kopicki et al. (2015) to 87.8% for a variant that employs all three innovations, plus autonomous training. Furthermore, these differences are statistically significant.

The authors gratefully acknowledge funding from the PaCMan project FP7-IST-600918. The sourcecode for the algorithms reported here is publically available as part of the Golem robot manipulation stack: https://github.com/RobotsLab/Golem .


  • Arruda et al. (2016) Arruda E, Wyatt J and Kopicki M (2016) Active vision for dexterous grasping of novel objects. In: Intelligent Robots and Systems (IROS), 2016 IEEE/RSJ International Conference on. IEEE, pp. 2881–2888. URL http://doi.org/10.1109/IROS.2016.7759446.
  • Bekiroglu et al. (2011a) Bekiroglu Y, Huebner K and Kragic D (2011a) Integrating grasp planning with online stability assessment using tactile sensing. In: International Conference on Robotics and Automation. IEEE, pp. 4750–4755. URL https://doi.org/10.1109/ICRA.2011.5980049.
  • Bekiroglu et al. (2011b) Bekiroglu Y, Laaksonen J, Jorgensen JA, Kyrki V and Kragic D (2011b) Assessing grasp stability based on learning and haptic data. IEEE Transactions on Robotics 27(3): 616–629. URL https://doi.org/10.1109/TRO.2011.2132870.
  • Ben Amor et al. (2012) Ben Amor H, Kroemer O, Hillenbrand U, Neumann G and Peters J (2012) Generalization of human grasping for multi-fingered robot hands. In: IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp. 2043–2050. URL https://doi.org/10.1109/IROS.2012.6386072.
  • Bicchi and Kumar (2000) Bicchi A and Kumar V (2000) Robotic grasping and contact: a review. In: International Conference on Robotics and Automation. IEEE, pp. 348–353. URL https://doi.org/10.1109/ROBOT.2000.844081.
  • Bohg et al. (2011) Bohg J, Johnson-Roberson M, León B, Felip J, Gratal X, Bergstrom N, Kragic D and Morales A (2011) Mind the gap – robotic grasping under incomplete observation. In: International Conference on Robotics and Automation. IEEE, pp. 686–693. URL https://doi.org/10.1109/ICRA.2011.5980354.
  • Bohg and Kragic (2010) Bohg J and Kragic D (2010) Learning grasping points with shape context. Robotics and Autonomous Systems 58(4): 362–377. URL https://doi.org/10.1016/j.robot.2009.10.003.
  • Bohg et al. (2014) Bohg J, Morales A, Asfour T and Kragic D (2014) Data-driven grasp synthesis – a survey. IEEE Transactions on Robotics 30(2): 289–309. URL http://doi.org/10.1109/TRO.2013.2289018.
  • Bousmalis et al. (2018) Bousmalis K, Irpan A, Wohlhart P, Bai Y, Kelcey M, Kalakrishnan M, Downs L, Ibarz J, Pastor P, Konolige K, Levine S and Vanhoucke V (2018) Using simulation and domain adaptation to improve efficiency of deep robotic grasping URL http://arxiv.org/abs/1709.07857.
  • Boutselis et al. (2014) Boutselis G, Bechlioulis C, Liarokapis M and Kyriakopoulos K (2014) Task specific robust grasping for multifingered robot hands. In: IEEE International Conference on Robotics and Automation. IEEE, pp. 858–863. URL https://doi.org/10.1109/IROS.2014.6942660.
  • Ciocarlie et al. (2005) Ciocarlie M, Miller A and Allen P (2005) Grasp analysis using deformable fingers. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, 2005. (IROS 2005). IEEE, pp. 4122–4128. URL https://doi.org/10.1109/IROS.2005.1545525.
  • Ciocarlie and Allen (2009) Ciocarlie MT and Allen PK (2009) Hand posture subspaces for dexterous robotic grasping. The International Journal of Robotics Research 28(7): 851–867. URL https://doi.org/10.1177/0278364909105606.
  • Detry et al. (2013) Detry R, Ek C, Madry M and Kragic D (2013) Learning a dictionary of prototypical grasp-predicting parts from grasping experience. In: International Conference on Robotics and Automation. IEEE, pp. 601–608. URL https://doi.org/10.1109/ICRA.2013.6630635.
  • Detry et al. (2011) Detry R, Kraft D, Kroemer O, Bodenhagen L, Peters J, Krüger N and Piater J (2011) Learning grasp affordance densities. Paladyn. Journal of Behavioral Robotics 2(1): 1–17. URL http://doi.org/10.2478/s13230-011-0012-x.
  • Dogar et al. (2012) Dogar M, Hsiao K, Ciocarlie M and Srinivasa S (2012) Physics-based grasp planning through clutter. In: Proceedings of Robotics: Science and Systems VIII. pp. 57–64. URL https://doi.org/10.15607/RSS.2012.VIII.008.
  • Dogar and Srinivasa (2010) Dogar M and Srinivasa S (2010) Push-grasping with dexterous hands: Mechanics and a method. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robotics and Systems (IROS 2010). pp. 2123–2130. URL http://doi.org/10.1109/IROS.2010.5652970.
  • Ekvall and Kragic (2004) Ekvall S and Kragic D (2004) Interactive grasp learning based on human demonstration. In: IEEE International Conference on Robotics and Automation. pp. 3519–3524. URL http://doi.org/10.1109/ROBOT.2004.1308798.
  • Ferrari and Canny (1992) Ferrari C and Canny J (1992) Planning optimal grasps. In: International Conference on Robotics and Automation. pp. 2290–2295. URL https://doi.org/10.1109/ROBOT.1992.219918.
  • Fischinger and Vincze (2012) Fischinger D and Vincze M (2012) Empty the basket – a shape based learning approach for grasping piles of unknown objects. In: IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp. 2051–2057. URL https://doi.org/10.1109/IROS.2012.6386137.
  • Fisher (1953) Fisher RA (1953) Dispersion on a sphere. In: Proc. Roy. Soc. London Ser. A., volume 217. Royal Society, pp. 295–305.
  • Frey and Dueck (2007) Frey BJ and Dueck D (2007) Clustering by passing messages between data points. Science 315(5814): 972–976. URL https://doi.org/10.1126/science.1136800.
  • Goins et al. (2014) Goins AK, Carpenter R, Wong WK and Balasubramanian R (2014) Evaluating the efficacy of grasp metrics for utilization in a Gaussian process-based grasp predictor. In: IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE/RSJ, pp. 3353–3360. URL https://doi.org/10.1109/IROS.2014.6943029.
  • Goldfeder and Allen (2011) Goldfeder C and Allen PK (2011) Data-driven grasping. Autonomous Robots 31(1): 1–20. URL https://doi.org/10.1007/s10514-011-9228-1.
  • Gori et al. (2014) Gori I, Pattacini U, Tikhanoff V and Metta G (2014) Three-finger precision grasp on incomplete 3D point clouds. In: IEEE International Conference on Robotics and Automation. IEEE, pp. 5366–5373. URL https://doi.org/10.1109/ICRA.2014.6907648.
  • Gualtieri et al. (2016) Gualtieri M, ten Pas A, Saenko K and Platt R (2016) High precision grasp pose detection in dense clutter. In: IEEE/RSJ International Conference on Inteligent Robots and Systems. IEEE, pp. 598–605. URL https://doi.org/10.1109/IROS.2016.7759114.
  • Hang et al. (2014)

    Hang K, Stork J, Pokorny F and Kragic D (2014) Combinatorial optimization for hierarchical contact-level grasping.

    In: IEEE International Conference on Robotics and Automation. IEEE, pp. 381–388. URL https://doi.org/10.1109/ICRA.2014.6906885.
  • Herzog et al. (2014) Herzog A, Pastor P, Kalakrishnan M, Righetti L, Bohg J, Asfour T and Schaal S (2014) Learning of grasp selection based on shape-templates. Autonomous Robots 36(1-2): 51–65. URL https://doi.org/10.1007/s10514-013-9366-8.
  • Hillenbrand and Roa (2012) Hillenbrand U and Roa M (2012) Transferring functional grasps through contact warping and local replanning. In: IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp. 2963–2970. URL https://doi.org/10.1109/IROS.2012.6385989.
  • Hsiao et al. (2010) Hsiao K, Chitta S, Ciocarlie MT and Jones EG (2010) Contact-reactive grasping of objects with partial shape information. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 1228–1235. URL https://doi.org/10.1109/IROS.2010.5649494.
  • Hsiao et al. (2011) Hsiao K, Kaelbling LP and Lozano-Pérez T (2011) Robust grasping under object pose uncertainty. Autonomous Robots 31(2-3): 253. URL https://doi.org/10.1007/s10514-011-9243-2.
  • Hsiao and Lozano-Perez (2006)

    Hsiao K and Lozano-Perez T (2006) Imitation learning of whole-body grasps.

    In: IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 5657–5662. URL http://dx.doi.org/10.1109/IROS.2006.282366.
  • Huebner and Kragic (2008) Huebner K and Kragic D (2008) Selection of robot pre-grasps using box-based shape approximation. In: Intelligent Robots and Systems, 2008. IROS 2008. IEEE/RSJ International Conference on. IEEE, pp. 1765–1770. URL 10.1109/IROS.2008.4650722.
  • Huebner et al. (2008) Huebner K, Ruthotto S and Kragic D (2008) Minimum volume bounding box decomposition for shape approximation in robot grasping. In: Robotics and Automation, 2008. ICRA 2008. IEEE International Conference on. IEEE, pp. 1628–1633. URL https://doi.org/10.1109/ROBOT.2008.4543434.
  • Hyttinen et al. (2015) Hyttinen E, Kragic D and Detry R (2015) Learning the tactile signatures of prototypical object parts for robust part-based grasping of novel objects. In: IEEE International Conference on Robotics and Automation. IEEE, pp. 4927–4932. URL https://doi.org/10.1109/ICRA.2015.7139883.
  • Johns et al. (2016) Johns E, Leutenegger S and Davison AJ (2016) Deep learning a grasp function for grasping under gripper pose uncertainty. In: IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp. 4461–4468. URL https://doi.org/10.1109/IROS.2016.7759657.
  • Kanatani (2005) Kanatani K (2005) Statistical optimization for geometric computation: theory and practice. Courier Dover Publications.
  • Kanoulas et al. (2017) Kanoulas D, Lee J, Kanoulas D and Tsagarakis N (2017) Visual grasp affordance localization in point clouds using curved contact patches. International Journal of Humanoid Robotics 14(1): 1–21. URL https://doi.org/10.1142/S0219843616500286.
  • Kappler et al. (2015) Kappler D, Bohg J and Schaal S (2015) Leveraging big data for grasp planning. In: IEEE International Conference on Robotics and Automation. IEEE, pp. 4304–4311. URL https://doi.org/10.1109/ICRA.2015.7139793.
  • Kim et al. (2013) Kim J, Iwamoto K, Kuffner JJ, Ota Y and Pollard NS (2013) Physically based grasp quality evaluation under pose uncertainty. IEEE Transactions on Robotics 29(6): 1424 – 1439.
  • Kirkpatrick et al. (1983) Kirkpatrick S, Gelatt CD and Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598): 671–680. URL https://doi.org/10.1126/science.220.4598.671.
  • Klingbeil et al. (2011) Klingbeil E, Rao D, Carpenter B, Ganapathi V, Ng A and Khatib O (2011) Grasping with application to an autonomous checkout robot. In: IEEE International Conference on Robotics and Automation. IEEE, pp. 2837–2844. URL https://doi.org/10.1109/ICRA.2011.5980287.
  • Kootstra et al. (2012) Kootstra GW, Popović M, Jørgensen JA, Kuklinski K, Miatliuk K, Kragic D and Krüger N (2012) Enabling grasping of unknown objects through a synergistic use of edge and surface information. The International Journal of Robotics Research 34: 26–42. URL http://dx.doi.org/10.1177/0278364915594244.
  • Kopicki et al. (2015) Kopicki M, Detry R, Adjigble M, Stolkin R, Leonardis A and Wyatt JL (2015) One-shot learning and generation of dexterous grasps for novel objects. The International Journal of Robotics Research 35: 959–976. URL https://doi.org/10.1177/0278364915594244.
  • Kopicki et al. (2014) Kopicki M, Detry R, Schmidt F, Borst C, Stolkin R and Wyatt JL (2014) Learning dextrous grasps that generalise to novel objects by combining hand and contact models. In: IEEE International Conference on Robotics and Automation. IEEE, pp. 5358–5365. URL http://doi.org/10.1109/ICRA.2014.6907647.
  • Kragic and Christensen (2003) Kragic D and Christensen HI (2003) Robust visual servoing. The International Journal of Robotics Research 22(10-11): 923–939. URL https://doi.org/10.1177/027836490302210009.
  • Kroemer et al. (2012) Kroemer O, Ugur E, Oztop E and Peters J (2012) A kernel-based approach to direct action perception. In: IEEE International Conference on Robotics and Automation. IEEE, pp. 2605–2610. URL https://doi.org/10.1109/ICRA.2012.6224957.
  • Kumra and Kanan (2017)

    Kumra S and Kanan C (2017) Robotic grasp detection using deep convolutional neural networks.

    In: IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 769–776. URL https://doi.org/10.1109/IROS.2017.8202237.
  • Lenz et al. (2015) Lenz I, Lee H and Saxena A (2015) Deep learning for detecting robotic grasps. The International Journal of Robotics Research 34(4–5): 705–724. URL https://doi.org/10.1177/0278364914549607.
  • Levine et al. (2017) Levine S, Pastor P, Krizhevsky A, Ibarz J and Quillen D (2017) Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. International Journal of Robotics Research URL http://dx.doi.org/10.1177/0278364917710318.
  • Li et al. (2016) Li M, Hang K, Kragic D and Billard A (2016) Dexterous grasping under shape uncertainty. Robotics and Autonomous Systems 75: 352–364. URL http://dx.doi.org/10.1016/j.robot.2015.09.008.
  • Liu (2000) Liu YH (2000) Computing n-finger form-closure grasps on polygonal objects. The International Journal of Robotics Research 19(2): 149–158. URL https://doi.org/10.1177/02783640022066798.
  • Lu et al. (2017) Lu Q, Chenna K, Sundaralingam B and Hermans T (2017) Planning multi-fingered grasps as probabilistic inference in a learned deep network. In: International Symposium on Robotics Research. URL https://arxiv.org/abs/1804.03289.
  • Mahler et al. (2017) Mahler J, Liang J, Niyaz S, Laskey M, Doan R, Liu X, Aparicio J and Goldberg K (2017) Dex-Net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. In: Robotics Science and Systems XIII (RSS). URL http://www.roboticsproceedings.org/rss13/p58.html.
  • Mahler et al. (2016) Mahler J, Pokorny F, Hou B, Roderick M, Laskey M, Aubry M, Kohlhoff K, Kröger T, Kuffner J and Goldberg K (2016) Dex-Net 1.0: A cloud-based network of 3D objects for robust grasp planning using a multi-armed bandit model with correlated rewards. In: IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp. 1957–1964. URL https://doi.org/10.1109/ICRA.2016.7487342.
  • Miller and Allen (2004) Miller A and Allen P (2004) Graspit! a versatile simulator for robotic grasping. IEEE Robotics & Automation Magazine 11(4): 110–122. URL https://doi.org/10.1109/MRA.2004.1371616.
  • Pinto and Gupta (2016) Pinto L and Gupta A (2016) Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. In: IEEE International Conference on Robotics and Automation. IEEE, pp. 3406–3412. URL https://doi.org/10.1109/ICRA.2016.7487517.
  • Platt et al. (2006) Platt R, Grupen R and Fagg A (2006) Learning grasp context distinctions that generalize. In: IEEE-RAS International Conference on Humanoid Robots. IEEE, pp. 504–511. URL https://doi.org/10.1109/ICHR.2006.321320.
  • Pollard (2004) Pollard N (2004) Closure and quality equivalence for efficient synthesis of grasps from examples. The International Journal of Robotics Research 23(6): 595–613. URL https://doi.org/10.1177/0278364904044402.
  • Ponce and Faverjon (1995) Ponce J and Faverjon B (1995) On computing three-finger force-closure grasps of polygonal objects. IEEE Transactions on Robotics and Automation 11(6): 868–881. URL https://doi.org/10.1109/70.478433.
  • Popoović et al. (2010) Popoović M, Kraft D, Bodenhagen L, Başeski E, Pugeault N, Kragic D, Asfour T and Krüger N (2010) A strategy for grasping unknown objects based on co-planarity and colour information. Robotics and Autonomous Systems 37: 551–565. URL https://doi.org/10.1016/j.robot.2010.01.003.
  • Redmon and Angelova (2015) Redmon J and Angelova A (2015) Real-time grasp detection using convolutional neural networks. In: IEEE International Conference on Robotics and Automation. IEEE, pp. 1316–1322. URL https://doi.org/10.1109/ICRA.2015.7139361.
  • Richtsfeld and Zillich (2008) Richtsfeld M and Zillich M (2008) Grasping unknown objects based on 2.5D range data. In: IEEE International Conference on Automation Science and Engineering. IEEE, pp. 691–696. URL https://doi.org/10.1109/COASE.2008.4626412.
  • Roa and Suarez (2015) Roa M and Suarez R (2015) Grasp quality measures: Review and performance. Autonomous Robots 38(1): 65–88. URL https://doi.org/10.1007/s10514-014-9402-3.
  • Rosales et al. (2014) Rosales C, Ajoudani A, Gabiccini M and Bicchi A (2014) Active gathering of frictional properties from objects. In: IEEE/RSJ International Conference on Intelligent Robots and Systems. ISBN 978-1-4799-6934-0, pp. 3982–3987. URL https://doi.org/10.1109/IROS.2014.6943122.
  • Rosales et al. (2012) Rosales C, Suárez R, Gabiccini M and Bicchi A (2012) On the synthesis of feasible and prehensile robotic grasps. In: IEEE International Conference on Robotics and Automation. IEEE, pp. 550–556. URL https://doi.org/10.1109/ICRA.2012.6225238.
  • Rusu et al. (2009a) Rusu R, Blodow N and Beetz M (2009a) Fast point feature histograms (FPFH) for 3D registration. In: IEEE International Conference on Robotics and Automation. pp. 2155–2162. URL https://doi.org/10.1109/ROBOT.2009.5152473.
  • Rusu et al. (2009b) Rusu RB, Holzbach A, Diankov R, Bradski G and Beetz M (2009b) Perception for mobile manipulation and grasping using active stereo. In: IEEE-RAS International Conference on Humanoids. IEEE, pp. 632–638. URL https://doi.org/10.1109/ICHR.2009.5379597.
  • Sahbani et al. (2012) Sahbani A, El-Khoury S and Bidaud P (2012) An overview of 3d object grasp synthesis algorithms. Robotics and Autonomous Systems 60(3): 326–336. URL https://doi.org/10.1016/j.robot.2011.07.016.
  • Saut and Sidobre (2012) Saut J and Sidobre D (2012) Efficient models for grasp planning with a multi-fingered hand. Robotics and Autonomous Systems 60(3): 347–357. URL https://doi.org/10.1016/j.robot.2011.07.019.
  • Saxena et al. (2008) Saxena A, Wong LLS and Ng AY (2008) Learning grasp strategies with partial shape information. In:

    Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 3

    , AAAI’08. AAAI Press.
    ISBN 978-1-57735-368-3, pp. 1491–1494. URL http://dl.acm.org/citation.cfm?id=1620270.1620316.
  • Seita et al. (2016) Seita D, Pokorny F, Mahler J, Kragic D, Franklin M and Canny K J Goldberg (2016) Large-scale supervised learning of the grasp robustness of surface patch pairs. In: IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR). IEEE, pp. 216–223. URL https://doi.org/10.1109/SIMPAR.2016.7862399.
  • Shapiro et al. (2004) Shapiro A, Rimon E and Burdick JW (2004) On the mechanics of natural compliance in frictional contacts and its effect on grasp stiffness and stability. In: IEEE International Conference on Robotics and Automation, volume 2. IEEE, pp. 1264–1269. URL https://doi.org/10.1109/ROBOT.2004.1307998.
  • Shimoga (1996) Shimoga KB (1996) Robot grasp synthesis algorithms: A survey. The International Journal of Robotics Research 15(3): 230–266. URL https://doi.org/10.1177/027836499601500302.
  • Silverman (1986) Silverman BW (1986) Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC.
  • Spivak (1999) Spivak M (1999) A Comprehensive Introduction to Differential Geometry, volume 1. Publish or Perish, Berkeley.
  • Sudderth (2006) Sudderth EB (2006) Graphical models for visual object recognition and tracking. PhD Thesis, MIT, Cambridge, MA. URL http://cs.brown.edu/~sudderth/papers/sudderthPhD.pdf.
  • ten Pas and Platt (2014) ten Pas A and Platt R (2014) Localizing handle-like grasp affordances in 3D point clouds. In: International Symposium on Experimental Robotics. pp. 623–638. URL https://doi.org/10.1007/978-3-319-23778-7_41.
  • Trobina and Leonardis (1995) Trobina M and Leonardis A (1995) Grasping arbitrarily shaped 3-D objects from a pile. In: IEEE International Conference on Robotics and Automation. IEEE, pp. 241–246. URL https://doi.org/10.1109/ROBOT.1995.525292.
  • Varley et al. (2017) Varley J, DeChant C, Richardson A, Ruales J and Allen P (2017) Shape completion enabled robotic grasping. In: IEEE/RSJ International Conference on Inteligent Robots and Systems. IEEE, pp. 2442–2447. URL https://doi.org/10.1109/IROS.2017.8206060.
  • Varley et al. (2015) Varley J, Weisz J, Weiss J and Allen P (2015) Generating multi-fingered robotic grasps via deep learning. In: IEEE/RSJ International Conference on Inteligent Robots and Systems. IEEE, pp. 4415–4420. URL https://doi.org/10.1109/IROS.2015.7354004.
  • Veres et al. (2017) Veres M, Moussa M and Taylor GW (2017) Modeling grasp motor imagery through deep conditional generative models. IEEE Robotics and Automation Letters 2(2): 757–764. DOI:10.1109/LRA.2017.2651945. URL https://doi.org/10.1109/LRA.2017.2651945.
  • Wang et al. (2016) Wang Z, Li Z, Wang B and Liu H (2016) Robot grasp detection using multimodal deep convolutional neural networks. Advances in Mechanical Engineering 8(9): 1–12. URL https://doi.org/10.1177/1687814016668077.
  • Zheng and Qian (2005) Zheng Y and Qian WH (2005) Coping with the grasping uncertainties in force-closure analysis. The International Journal of Robotics Research 24(4): 311–327. URL https://doi.org/10.1177/0278364905049469.
  • Zhou and Hauser (2017) Zhou Y and Hauser K (2017) 6DoF grasp planning by optimizing a deep learning scoring function. In: Robotics: Science and Systems (RSS) —Turning a Problem into a Solution.