1 Introduction
Imagine that you are reaching into the fridge to grasp an object you can only partially see. Rather than relying solely on vision, you must use touch in order to localise it and securely grasp it. However, humans would not poke the object to localise it first and then grasp it. We compensate for the uncertainty by approaching the object in a way such that if a contact occurs it will generate enough information about where the object is and the object will be grasped with a minimum adaptation of the initial trajectory.
Previous work attempted to couple the uncertainty reduction and grasp execution in the framework of partially observable Markov decision processes (POMDPs). Although methods are advancing for continuous state
(Porta et al., 2006; Bai et al., 2010; Brooks et al., 2006) and continuous action spaces (Porta et al., 2006; Murphy, 2000), no POMDP planners yet scale to the high dimensional belief state and action spaces required for robot grasping. This is especially true for manipulators with great dexterity as they have a high number of degrees of freedom. Instead, actual robot implementations of the POMDP approach to active tactile grasping separate exploratory and grasping actions and plan how to sequence them
(Hsiao et al., 2011b), typically by relying on a userdefined threshold to know when the belief is accurate enough to switch from gathering information to execution of a precomputed grasp action. This approach fails to exploit the fact that, in tactile grasping, hand movements can both move towards the grasp configuration, and reduce uncertainty. They are most naturally performed concurrently, rather than sequentially. Furthermore, these approaches typically rely on constraining the belief space to Gaussian distributions. Extensions to nonparametric representations of belief (Nikandrova et al., 2013), typically result in intractable planning problems due to the high dimensionality of nonGaussian parametrisation.This work presents a formulation of dexterous manipulation that aims to exploit concurrency in exploratory and grasping actions for reachtograsp hand movements. The properties of the approach are that it:

tracks highdimensional belief states defined over a 6D, nonGaussian pose uncertainty;

efficiently plans in a fixeddimensional space;

simultaneously gathers information while grasping, i.e., there is no need to switch between gathering information and grasping since the action space is the space of dexterous reachtograsp trajectories;

does not require a usersupplied mesh model of the object or a precomputed grasp associated with the object;

copes also with nonconvex objects, i.e., there are no limitations to the shape of the objects that it can successfully grasp.
We build our approach by combining the idea of hypothesisbased planning (HBP), initially proposed in (Platt et al., 2011), and our oneshot learning algorithm for dexterous grasping of novel objects (Kopicki et al., 2015). The hypothesisbased planner works as follows. Instead of planning directly in a high dimensional belief space, our plan is constructed on a fixeddimensional, sampled representation of belief. In other words, the belief space is projected onto a set of competing hypothesis in the underlying state space. However, our implementation of the HBP algorithm extends the work in (Platt et al., 2011)
in several directions. First, Platt’s formulation of the hypothesisbased planner is defined on a set of actions (i.e., movement constrained in the horizontal plane) that differs from the actual grasp (i.e. a pinch grasp with two paddles). In contrast, we formulate the problem on the same action space for each stage (i.e., dexterous reachtograsp trajectories). As a result, we do not require a usersupplied threshold over the current belief to estimate when to interrupt the information gathering phase and execute a predefined grasp. Another difference is that the observational model used in
(Platt et al., 2011) relies on contactless sensors (i.e. laser sensors), while we maximise tactile observations for a dexterous robotic hand; and, finally, we do not make any assumptions about the model of the object to be grasped, in contrast to the original work that assumes a convex object (i.e. a box) aligned in front of the robot (see Sec. 2.3 for further details). On top of our hypothesisbased planner, our grasping algorithm enables us to learn a set of grasp contacts on a pointcloud object model (PCOM), directly obtainable from a depth camera, and to generalise to objects of novel shape. Therefore we do not require a mesh model of the object, and we can also generate grasps on target objects with incomplete point clouds. Hence our algorithm is exceptionally flexible in planning dexterous reachtograsp trajectories for novel objects.In order to link these two methods, hypothesis based planning and dexterous grasping of novel objects, we need to construct a representation of the belief space that will allow us to track pose uncertainty for a PCOM, in 6D, and cope with the nonGaussian posterior. We do so by employing a nonparametric representation of the belief state defined as a kernel density estimator. Each kernel is a weighted pose of the target object inferred from visual data collected on the fly (see Sec.
3.1).Our experimental results show that our planner, IR3ne, is more reliable than openloop grasping from vision. We further show that IR3ne improves over simple tactile replanning in three ways: i) it doubles the proportion of times in which the robot reaches and grasps the object at the first attempt, ii) if an unexpected contact is generated along the trajectory, the contact provides more information about the object’s pose, and thus iii) it reduces the number of replanning steps before converging to a grasp. Experiments in simulation and on a real robot (Fig. 1) confirm these attractive properties.
We continue with related work (Sec. 2), then describe the planning problem formulation (Sec. 3). We describe all aspects of the framework (Sec. 4), and report and discuss the experimental results (Sec. 5). We finish with concluding remarks (Sec. 6).
Tracking highdimensional belief 
6D nonGaussian pose uncertainty 
Planning in a fixeddimensional space 
Actively gathering tactile information 
Simultaneously IG & grasping 
Switching IG & grasping 
Planning for dexterous grasp 
Coping with nonconvex objects 
No need for usersupplied grasp 
Recovering from incorrect plans 

HBP  ✓  ✗  ✓  ✗  ✓  ✗  ✗  ✗  ✗  ✓ 
ISBP  ✓  ✗  ✗  ✗  ✗  ✓  ✗  ✗  ✗  ✓ 
POMDP  ✓  ✗  ✗  ✓  ✗  ✗  ✗  ✗  ✗  ✓ 
IR3ne  ✓  ✓  ✓  ✓  ✓  ✓  ✓  ✓  ✓  ✓ 
2 Related Work
The problem of robot grasping under object pose uncertainty can be decomposed into: state estimation, grasp synthesis, reach to grasp planning, and control. A typical approach is to represent the belief state using prior distributions, select a grasp robust w.r.t. uncertainty and finally use tactile feedback to adjust the grasping trajectory, see e.g. (Nikandrova et al., 2013). The reachtograsp trajectory is generally computed using samplingbased techniques which minimise the trajectory cost. Less work has explored the more complex problem of reasoning about uncertainty while planning this dexterous reachtograsp trajectory.
Table 1 summarises the main differences between our approach and the closest related work at glance.
2.1 State estimation
One class of approaches to state estimation uses a maximum likelihood estimate (MLE), e.g. (Kopicki et al., 2014). Given an object model, a typical robust global estimator samples subsets of points and computes hypothesised poses based on feature correspondences between the data and the model (Hillenbrand, 2008; Tuzel et al., 2005; Hillenbrand and Fuchs, 2011; Bracci et al., 2017). Nonetheless, even for full shape matching, small errors in the pose estimate may lead to critical failures while attempting to grasp, i.e., unexpected contacts may damage the object or the hand itself. Another class of approaches maintains a belief state
, which is a probability distribution over the underlying states. Maintaining a density over the pose of the object yields Bayesian strategies that are capable of maximising the probability of grasp success given the current belief state, as in
(Hsiao et al., 2011a). However, belief space planning can be computationally expensive, especially for highdimensional, complex density functions. A popular choice is to constrain the belief space to Gaussian density functions.Since many robotic problems have multimodal uncertainty, there are benefits in using a nonparametric representation of the belief space, such as a particle filter (PF). There are many examples of PFs used for state estimation in manipulation problems (Petrovskaya and Khatib, 2011), (Nikandrova et al., 2013) and (Platt et al., 2001). Most of these methods sample an initial particle set from an userdefined distribution attached to the maximum a posteriori estimate (MAP) obtained from vision, and often the uncertainty is limited to 2D.
2.2 Grasp Synthesis
There are analytic and empirical approaches to grasp synthesis (Sahbani et al., 2012). Analytic approaches are typically associated with an optimisation problem, and thus the computational effort grows with the size of the grasp solution space, which in turn grows with the number of fingers and contact points. On the other hand, data driven approaches learn a mapping from an object description to the grasp (Saxena et al., 2008; Detry, 2010; Kopicki et al., 2014; Gu et al., 2017). Learning this mapping from incomplete or erroneous data remains challenging. In (Detry, 2010), the authors address the problem of associating a grasp with a partially perceived object.
In our previous work (Kopicki et al., 2014, 2015), an efficient method is presented to learn a dexterous grasp type (e.g. pinch, rim) from a single example. The method can generalise within and across object categories, and use full or partial shape information. In the original work in (Kopicki et al., 2014, 2015), each contact model is associated with an (openloop) approaching trajectory demonstrated during the learning phase. In this paper, we employ the same contact model described in (Kopicki et al., 2015) but we do not use a purely open loop grasp method. Instead we replace it with the trajectory computed by our planner, and recomputed in the case of feedback from tactile contact.
2.3 Grasp Planning
Some algorithms pose the reach to grasp problem as a beliefstate planning problem. Typical implementations rely on constraining the belief space to a Gaussian. Extensions to nonparametric representations of belief (Nikandrova et al., 2013), typically result in intractable planning problems due to the high dimensionality of the nonGaussian parametrisation. In contrast, Platt et al. (Platt et al., 2001), (Platt et al., 2012), enabled efficient planning under state uncertainty. The key was to plan an action sequence which will generate observations that distinguish between competing hypotheses. However, this was formulated and tested for bimanually grasping a cardboard box, using a laser range finder on one arm as the sensor. All movements of the arms were constrained to lie in the horizontal plane, as was the object pose uncertainty. With this formulation, a hypothesis based plan was created to gather information, while aligning the robot’s arm equipped with the laser in front of the box’s edge. The system required a usersupplied model of the object and relied on a userdefined threshold to know when the belief is accurate enough to switch from gathering information to executing a precomputed grasp action. Another difference with our work is that Platt’s formulation of the hypothesisbased planner is defined on a set of actions (i.e., leftright movement of the left arm of the robot) that are distinct from the actual grasping action (i.e., a pinch grasp with two paddles). In contrast, we formulate the grasping and information gathering problems on the same action space (i.e., dexterous reachtograsp trajectories) and our system can dextrously grasp nonconvex objects with nonGaussian 6D uncertainty.
There is also evidence that humans compensate for uncertainty due to noisy motor commands and imperfect sensory information during the execution of reachtograsp movements (Kording and Wolpert, 2004). The authors in (Christopoulos and Schrater, 2009) induced pose uncertainty on the object to be grasped in order to investigate the compensation strategies adopted by humans. The authors evaluated the benefits of uncertainty compensation, based on the hypothesis that participants prefer to generate forceclosure grasps at first contact, and therefore the participants tended to modify their approach along the direction of maximum uncertainty and increase their peak gripwidth. The experimental results in our paper show that our information reward algorithm, IR3ne, yields a similar behaviour.
3 Problem Formulation
In this section, we formulate our informationbased planner for dexterous robot grasping under pose uncertainty. Figure 3 shows our algorithm at glance.
Let us consider a discretetime system with continuous, nonlinear, deterministic dynamics,
(1) 
where is a configuration state of the robot and
is an action vector, both parameterised over time
. Our goal is to find a trajectory, , the moves our dexterous robot manipulator from its current pose, , to a target grasp, while compensating for the object pose uncertainty. Thus the three following subproblems are required to be solved:
How to estimate the object pose uncertainty.

How to compute the target grasp, , on an unknown object with pose uncertainty.

How to compensate for the uncertainty along the reachtograsp trajectory, .
Density distribution over the object’s pose (belief state)  

Nonlinear system  
Robot’s pose at time t  
Observation function  
Expected observation at time  
Set of hypotheses subsampled from  
Number of particles in  
Number of particles to approximate  
Cost function for planning  
Model point cloud (PCOM)  
Contact model for a grasp  
Reference frame for in world coordinates  
Number of particles in  
Initial set of object’s poses  
Time step in  
Final time step  
Elements of  
MLE hypotheses in  
Hypotheses in sampled from  
Query point cloud  
Block diagonal of measurement noise covariance matrix  
Contact query for a grasp  
Reference frame for in world coordinates  
Action vector  
Robot’s trajectory  
Robot’s pose in joint space  
Ending of the reachtograsp trajectory (target grasp)  
Pose in SE(3) in world coordinate  
Tactile observation vector  
Pose in SE(3) relative to 
3.1 Belief state estimation
To model multimodal uncertainty in the object pose, we employ a nonparametric representation (PF) of the belief state, , defined as a density function in SE(3). Let us denote by the Special Euclidean group of 3D poses (a 3D position and 3D orientation). An element is a homogeneous transformation. Previous works typically initialised the belief by using an MLE of the object pose (i.e., from vision) and then manually generating a set of particles by sampling from a Gaussian distribution attached to the MLE. In contrast, our approach uses a samplebased modelfitting procedure similar to the one presented in Hillenbrand and Fuchs (2011) to generate a set of MLEs, which is used as an initial belief state for the system. Thus, our belief state is defined nonparametrically by a set of 3D poses (or particles) . The probability density in a region is determined by the local density of the particles. The underlying pdf is approximated through a KDE (Silverman, 1986), by assigning a kernel function to each particle supporting the density, as
(2) 
where is the density value associated with the pose , is a diagonal matrix that represents the kernel bandwidth and is a normalised weight associated to such that . An initial set of poses, , is constructed by employing a pairfitting model as explained below. However, it is assumed that a model of the object as a point cloud (PCOM) is available to the system prior to execution, see Sec. 3.2.
The modelfitting procedure we employ to create an initial belief state is a surfletpair fitting procedure (Hillenbrand and Fuchs, 2011)
. This algorithm is wellknown in the computer vision community and works as follows. Let us consider the case in which a
model pointcloud, , is available to the system, or collected onthefly, and a new query point cloud, , is acquired. Let us also assume that each point and normal of is described w.r.t. an arbitrary reference frame, or pose, in world coordinates. The goal of the algorithm is to search for a rigid body transformation , described w.r.t. , that best aligns with . This is achieved by sampling a set of surflet features for each point cloud (i.e. pairs of points with their relative normals) and aligning the features on with the most similar in . The algorithm also produces a score value to describe the goodness of the alignment. We repeat this procedure times to construct our initial belief state, see Eq.(2), as a set of particleswhere is the estimate created by the fitting procedure, is the associated score, is a product of two homogeneous transformations and is a candidate pose of the object in our world coordinate system.
3.2 Acquisition of a PCOM
Our initial belief state estimation is created from visual data and we employ a samplebased modelfitting procedure, Sec. 3.1, which relies on a PCOM available to the system before execution. Nonetheless, the acquisition of a PCOM is done by scanning the workspace of the robot with a depth camera. For novel objects, when the associated PCOM is not already present in the dataset, the system can acquire the PCOM on the fly and use the same point cloud as the model and query . Although it may seem redundant to compute the pose density on the same point cloud since one would expect to obtain a good fit when comparing a point cloud with itself, the benefit of maintaining a density over the pose estimation overcomes the overhead computation, as shown in our results, Sec. 5.
It is important to notice that, in the current implementation of the system, the PCOM dataset contains only the model of the object to be grasped. Future work will focus on constructing a proper dataset of point clouds from which the system could generate a density function that would also take into account the shape uncertainty. However, this is outside the scope of this paper that only focus on demonstrating the benefits of compensating for pose uncertainty in dexterous reachtograsp actions.
3.3 The target grasp
In order to create a plan that compensate for uncertainty we need to associate a grasp to the maximum likelihood pose of the object to be grasped. From Sec. 3.1, we expect the object’s point cloud, , to be in pose .
The target grasp is computed by using our oneshot grasping algorithm. This algorithm enables us to learn a contact model, , for a type of grasp (i.e., pinch with support) that describes the distribution of the contacts between the robot’s end effector, , and the object’s surface, , with pose , from a demonstration. We can compute offline and store it in the system. Given a novel point cloud, , with pose , a new density function, is obtained for the new object. A candidate grasp can be sampled from , so that . For further details refer to (Kopicki et al., 2015). Figure 4 shows a pinch with support grasp learned on a bowl and transferred to a jug.
3.4 Planning a trajectory to maximise information gain
We know that in general the problem of planning in belief space is intractable (Littman, 1996). Instead, let us consider a method to search for a sequence of actions, , that distinguish between observations that would occur if the object were in the expected pose, , from those that would occur in other poses.^{1}^{1}1After the initial estimation of the belief state we do not rely anymore on vision. The only observations available to the system during the execution and replanning are tactile observations. See Fig. 2 for a graphical representation in 2D. A large number of particles, , is suitable to track the belief state and converge to the true state of the system. Nevertheless, to create a plan, it is more convenient to subsample the hypotheses. Thus, we define a subset of hypotheses,
(3) 
such that is the maximum likelihood estimate and , with .
The system in Eq.(1) also produces continuous, nonlinear, observation dynamics,
(4) 
where is a vector that describes the expected observation (contacts), at time , between the joint state of the robot, , and the object in pose . More generally, let be the robot configuration at time if the system begins at state and takes action . Therefore, the expected sequence of observations over a trajectory, , is:
(5) 
a column vector which describes the expected contacts at any time step of the trajectory . We then need to search for a sequence of actions which maximise the difference between observations that are expected to happen in the sampled states, , when the system is actually in the most likely hypothesis, . In other words, we want to find a sequence of actions, , that minimises
(6) 
where denotes the Gaussian distribution with mean and covariance and is the block diagonal of the measurement noise covariance matrix. Rather than optimising (6) we follow the suggested simplifications described in (Platt et al., 2001) by dropping the normalisation factor in the Gaussian and optimising the exponential factor only. Let us define for any
(7) 
then the modified cost function is
(8) 
it is worth noting that, when there is a significant difference between the two sequences of expected observations, and , the function is large and therefore is small. On the other hand, if the two sequences of expected observations are very similar to one another, their distance tends to 0 and tends to 1. Equation (8) can be minimised using a variety of planning techniques, such as rapidlyexploring random trees (RRTs) (LaValle, 1998), probabilistic roadmaps (PRM) (Kavraki and Svestka, 1996), differential dynamic programming (DDP) (Jacobson and Mayne, 1970) or sequential dynamic programming (SDP) (Betts, 2001). Next, we show the implementation of our information gathering algorithm by using a modified version of a PRM to cope with high DoF manipulators and nonconvex objects.
4 InformationReward based RePlanning Algorithm
This section provides a detailed explanation of the principal components in our architecture.First, Sec. 4.1 presents our observational model for tactile observations. Sections 4.2 and 4.3 show our modified PRM algorithm which allows us to efficiently plan reachtograsp trajectories for high DoF manipulators, as well as compute the information reward function, described in Eq. (8), for each node of the PRM. Section 4.4 describes our PCOMbased collision detection algorithm. Sections 4.5 and 4.6 show how unexpected contacts are integrated into the belief state and the replanning phase. Finally, Sec. 4.7 presents the terminal conditions for the algorithm.
4.1 An observation model for tactile contacts
The manipulator is composed of a set of rigid links organised in a kinematic chain and tree, comprising an arm, , and a set of multijoint fingers, denoted . We are only interested in the observations made by the fingers. Thus, we define , or simply , as the joint configuration of the finger. Through forward kinematics we know the relative pose of the fingertip, denoted .
The expected observation (contact) is related to the probability of a physical contact between the finger links and the object. We assume that this probability decays exponentially, based on the distance, , between each finger and the closest surface of the object assumed to be in pose (the computation of is explain in Sec.4.4). The observation model is limited to contacts which may occur on the internal surface of fingers. This directly affects the planner, which rewards trajectories that would generate contacts on the fingertips rather than on the back side of the fingers. Therefore we rewrite Eq.(4) as , where
(9) 
where is the inner product of, respectively, the fingertip’s normal and the estimated object surface’s normal, and describes a maximum range in which the probability of reading a contact is nonzero, is a normaliser.
4.2 Implementation of our planner IR3ne
The approach here incorporates the expected (tactile) value of information into the metric for the underlying physical space. This results in warping distances so as to create a nonEuclidean space, in which trajectories that maximise information gain are less costly^{2}^{2}2In this sense, the cost function we propose works similarly to the Mahalanobis distance..
We employ a modified version of Probabilistic Roadmap (PRM) (Kavraki and Svestka, 1996) to sample the reachable configuration space of the robot. The original PRM algorithm is composed of two phases: i) a learning phase, in which a connected graph of obstaclefree configurations of the robot is generated and, ii) a query phase, which finds a path between a given pair of configurations . However the computational cost for the learning phase grows fast with respect to the dimensionality of the space. Because of this we incrementally build connections between neighbouring nodes during the query phase.
Given a pair , which describes the root state in configuration space, , and the goal state in workspace, , of the trajectory, we use an A* algorithm to find a minimum cost trajectory on the graph . The A* algorithm selects the path which minimises the objective function
(10) 
where and , is a reachable goal configuration for the robot computed by inverse kinematics. The function is a cumulative travel distance to from , while is a linear combination of the costtogo from to a neighbour node and the expected costtogo from to the target, defined as:
(11) 
where , is a distance function in , and behaves as for neighbouring nodes, but returns otherwise. Equation (10) does not violate the consistency and admissibility of the cost function for A* in Euclidean spaces. The algorithm BSP (Sec. 5) uses the above cost function to plan reachtograsp trajectories.
In IR3ne, we modify the heuristic
in order to reward informative tactile explorations while attempting to reach the goal state (described as a target configuration of the manipulator), by embedding the information value as follows:(12)  
where is the diagonal covariance matrix of the sampled states, for any column vector , is the Mahalanobis distance centred in and is a factor which rewards trajectories with a large difference between expected observations if the object is at the expected location, , versus observations that would be expected if the object is at other poses, , sampled from the distribution of poses associated with the object’s positional uncertainty:
(13) 
where:
(14) 
for each , and is sequence of probabilities of reading a contact travelling from state to . In this implementation . In other words, we evaluate the probability of making a contact while moving from state to as the probability of making a contact only in the next state . Figure 2 shows the effects of the reward function in a 2D example.
Note that this observation model is designed to conserve Eq.(12) as in Eq.(11) when the probability of observing a tactile contact is zero. In fact, for robot configurations in which the distance to the sampled poses is larger than a threshold, , the cost function is equal to 1. However we also encode uncertainty in , which evaluates the expected distance to the goal configuration. In this way the planner also copes with pose uncertainty at the early stages of the trajectory, when the robot is still too far away from the object to observe any contacts.
4.3 Planning for dexterous manipulator
Several hierarchical path planning techniques have been developed over the past years to enable high DoF robots to move in complex environments. Typically, the hierarchy makes use of: i) a global planner and ii) a local planner. The global planner behaves as a samplebased approach to probe a gridbased environment to quickly find a coarse solution. Subsequently, the local planner behaves as an optimisation procedure to refine the solution. This approach has been exploited in numerous variations, but mostly focussed on mobile robots, as in (Zhu and Latombe, 1991). For dexterous grasping, however, the path planning problem is usually addressed by constraining the plan on a set of manifolds that capture the human hand’s workspace, as in (Rosell et al., 2011).
In contrast, we adopt a hierarchical path planner that enables us to plan dexterous reachtograsp trajectories up to 21 DoF. First a PRM is constructed only in the arm configuration space in order to find a global path between the
. It is worth noticing that, in this phase, the rest of the joints of the manipulator are interpolated in order to have a smooth passage from
to . Then the planned trajectory is refined by constructing a new local PRM. The local PRM only samples configurations nearby the waypoints of the global trajectory. However the local PRM has a higher dimensional configuration space to represent the whole of the manipulator (e.g. arm + hand joint space). Subsequently we employ a Differential Evolution (DE) procedure to optimise the trajectory and generate a smoother transition from one configuration to the next.This approach allows us to integrate the information gain cost function described in Eq.( 12) into both levels of the hierarchy. Hence, we enable the robot to adjust the trajectory of each of its fingers if more information is expected to be gained.
4.4 Planning a dexterous grasping trajectory for nonconvex objects
The problem of efficiently computing intersectiondepth between two intersecting objects is still an open problem especially in the general case, in which at least one of the two object is a nonconvex polyhedron. Our approach is based on a penetration depth (PD) algorithm, which is a measure of the overlap between two objects. The PD algorithm for nonconvex polyhedra has a lower bound for the worstcase time complexity of , where and are the number of faces that compose the objects (Weller, 2013).
In contrast, we represent the robot’s bodies by an openchain of convex polyhedra, and the object to be grasped as a 2level structure (Fig. 7). The first level of this structure wraps the PCOM in a bounding box. This level can efficiently avoid checking collisions when the robot’s links are far from the object to be grasped. The second level contains the PCOM organised in a KDTree (Muja and Lowe, 2014).
Our approach works as follows. Each link has a precomputed boundary defined as a set of triangle meshes and a reference frame placed at the centre of mass of the polyhedron. Each triangle mesh, , is composed of three vertices and a normal , such that is chosen with the direction that points outside the polyhedron. Given a set of nearest points, , to the link pose , we compute the penetration depth value for each triangle, , as the average depth
where is the cardinality of the set , is the distance of from the reference frame, approximated as the distance to ’s first vertex, and is a dot product. The depth associated with the bound is computed as .
The value is not a distance, but it assumes negative values if the majority of points in lie outside the polyhedron defined by .
Our planning process uses the PD algorithm in two cases: i) to reject PRM nodes in collision with an object and ii) to compute the information value for each node of the PRM. In the latter case, we rewrite from Eq.(9) as
(15) 
The worstcase complexity of the overall PD algorithm is dominated by the estimation of the closest point in the PCOM, which is , where is the number of points in the PCOM (Muja and Lowe, 2014) and the number of bounds in the hand. The same happens for the space complexity which is bounded by storing the PCOM in the kdtree, .
4.5 Belief update
When tactile observations occur the algorithm refines the current belief state. We think of the reachtograsp trajectory as composed of two parts: i) the approach trajectory which leads to a pregrasp configuration of the robot in which the fingers generally cage the object to be grasped without generating any contact, and ii) a finger closing trajectory which moves the fingers into contact and generates a force closure grasp. In this way any contact which occurs during the approach trajectory is considered as an unexpected observation. Similarly an insufficient number of contacts for a force closure at the end of a grasping trajectory can also be used as an observation. In our implementation, the belief is updated assuming deterministic dynamics.
At any time step, , a belief state , supported by a set of particle , represents a density function over poses for the object to be grasped. Once a trajectory is executed and a real (unexpected) observation is detected, the belief state is updated by integrating the new acquired information. We denote the updated belief, which is computed following the Bayes’ rule
(16) 
in which is the normalising factor, and then resampling is performed to generate a posteriori distribution as new set of particles .
4.6 Replanning
We need to rely on sensory feedback during the execution of the planned trajectory in order to detect whether or not unexpected observations occur. This triggers a belief update, using the observation gathered at executiontime, and consequently a replanning phase. In our experiments, the algorithm uses torque sensors based on the current draw at each joint of the robot’s hand to detect whether or not a link of the hand is in contact with the environment.
4.7 Terminal conditions
Our algorithm terminates its execution when no unexpected contacts occur and the target grasp is achieved. Since we do not have mesh models of the object to be grasped we cannot rely on force closure measures to signal successful termination of the algorithm. Nonetheless, in simulation it is possible to measure if the planned contacts between the fingers and the object’s point cloud have been achieved. On the real robot, the success of the grasp is evaluated by lifting the object. If the robot can hold the object, the grasp is considered successful.
5 Experiments
Three strategies for planning dexterous reachtograsp trajectories are evaluated in both a halfhumanoid robotic platform, Boris, and a simulated environment:

PRM: an openloop motion planner that plans trajectories towards the expected object pose without replanning.

BSP: a sequential belief state replanner that tracks high dimensional beliefs.

IR3ne: a sequential informationreward based replanner that maximises the information gain by planning in a fixed dimensional hypothesis space.
Table 3 summarises the differences between the algorithms, by showing which features are implemented. The features represent i) Motion Planning (MP), the ability of the algorithm to compute a freecollision trajectory for a robot manipulator, ii) Belief Planning (BP), the ability of updating the highdimensional belief state by integrating new observations, iii) Hypothesis Based Planning (HBP), the ability of planning in a fixed dimensionality space underlying the belief state, iv) Simultaneous Information Gain & Grasping (SIGG) and v) RePlanning (RP).
Sec 5.3 presents empirical results collected using Boris (Fig. 1), in which we tested the ability of each algorithm to pick up an object in a real scenario under the presence of pose uncertainty, due to incomplete perception abilities. The results are based on a total of 30 trials (10 per strategy) on a plastic jug.
The experimental results show that:

Sequential replanning achieves higher grasp success rates than openloop grasping from vision.

Trajectories that maximise information gain (IR3ne) generate contacts with the object that are more informative (to locate it).

More informative contacts lead to fewer replanning iterations to converge to a grasp.

Trajectories that maximise information gain increase up to 50% the proportion of grasps on a first attempt.

On the real robot, IR3ne increases grasp reliability over BSP.
5.1 Experiments in a virtual environment
The experiments are composed of 120 trials per object in a virtual environment, and they are run over four different objects: a jug, a coke bottle, a stapler and a Mr Muscle spray bottle. The algorithm has a model of the object to be grasped, in the form of a dense point cloud, acquired by scanning the object with a depth camera. Figure 8 shows the PCOM of each simulated object, and the associated target grasp configurations. For these experiments, the dense models are composed of 7 singleview point clouds.
Four different conditions have been tested. In each condition we randomly selected either 1, 3, 5 or 7 views of the object, in order to simulate a real situation in which the robot’s depth camera observes smaller or larger parts of the object. Hence, each trial has a different initial probability density over the object pose which depends on how much of the point cloud is visible. The density is computed by using our modelfitting algorithm as described in Sec. 3.1 with 1,000 surflet features sampled, which makes the estimation very fast but not accurate.
Figure 10 summarises the data collected in our experiments. All the results are compared with respect to the model coverage in percentage terms, i.e. one view typically covers 20% of the object. A ground truth pose for the object is also calculated for each object by using our modelfitting algorithm (Sec. 3.1), but sampling 500,000 surflet features. The algorithms have no knowledge of the ground truth pose, however in the virtual environment the ground truth is used to model the real object location, and is used to trigger simulated contacts with the robot hand. These simulated contacts cause the sequential replanning algorithm to stop and update the belief state.
5.2 Discussion of simulated experiments
Method  MP  BS  HBP  SIGG  RP 

PRM  ✓  ✗  ✗  ✗  ✗ 
BSP  ✓  ✓  ✗  ✗  ✓ 
IR3ne  ✓  ✓  ✓  ✓  ✓ 
Section 5.1 presents the results collected on 120 trials in simulation for four objects: a jug, a bottle of coke, a stapler and a Mr Muscle spray bottle. In these experiments, the aim is to evaluate the ability of the three strategies to localise the target object as well as converge to a target grasp configuration with artificially induced pose uncertainty.
The results in Fig. 10 confirm that i) integrating the unexpected contacts into a belief state before attempting a new reachtograsp trajectory increases the success rate of grasping (Fig. (a)a) and ii) that information rewarded trajectories requires less replanning iterations than conventional planning methods (Fig. (c)c).
In contrast to typical belief space planning algorithms that separate exploratory and grasping actions and plans how to sequence them, our experimental results show that we do need to feed to the algorithm a confidence threshold to ensure a global pose estimation accurate enough to attempt a grasp.
Figure (b)b presents the expected information gain at each contact. We compute the information gain as follows
(17) 
in which is the KullbackLeibler (KL) divergence between the pre and post contact belief states. Each particle is an element of the respective (i.e. pre or post) set of hypotheses as described in Eq. 3, and is the likelihood associated with by the density . A KL divergence of indicates that we can expect similar, if not the same, behaviour of two different distributions, while a KL divergence of
indicates that the two distributions behave in such a different manner that the expectation given the first distribution approaches zero. The results confirm that the contacts produced by IR3ne are significantly more informative than the ones generated by a conventional planner, especially when the uncertainty is wider. Also, the much smaller standard deviations for IR3ne compared to BSP in Fig.
(b)b show a more consistent and systematic acquisition of information from IR3ne throughout the trials.5.3 Experiments on the robot Boris
We empirically demonstrated the ability of Boris of picking up objects in the presence of pose uncertainty. The pose uncertainty encodes the perceptual handicap of the robot, which leads to erroneous pose estimations when the object is poorly visible.
The results for the PRM strategy are extrapolated from BSP, since these two approaches construct a reachtograsp trajectory minimising the same cost function, therefore if BSP achieves (or fails to achieve) a grasp at the first iteration, the PRM also would succeed (or fail) equally.
The data has been collected with the following procedure. The PCOM is acquired by scanning the operational workspace with a depth camera from 6 different views. These views are aligned, registered to generate a single PCOM. In each trial, the target object is placed in a different configuration on a table in front of the robot, and a point cloud is acquired by scanning the operational workspace from 3 different views. Once these views are preprocessed, a belief state with a set of 5 particles is estimated as described in Sec. 3.1.
From the visible point cloud of the object, we compute the target grasp by adapting a pinch with support grasp learned on a bowl using the method described in (Kopicki et al., 2015). The training and test grasps are shown in Fig. 4. Then a reachtograsp trajectory is computed and performed. A trial is considered successful if Boris converges to the target grasp configuration and lift the object from the table surface.
Figure 11 summarises the empirical results collected.
5.4 Discussion of real experiments
The empirical results in Fig. 11 confirm the simulated data. Both (re)planning algorithms are more efficiently converging to a grasp in the presence of not trivial pose uncertainty, i.e. errors in the pose estimation will not allow grasping at the first iteration with a conventional planner. Trajectories that maximise information gain are capable of generating informative contacts which outperform the information obtained by the other planner. This leads to fewer iterations and a better success rate.
Interestingly, IR3ne produces trajectories which are more robust with respect to the pose uncertainty, similar to the results presented in (Christopoulos and Schrater, 2009), by converging to the target grasp in a single attempt. Figure 12 shows the IR3ne converges to a grasp in a single attempt up to 50% more than a planner that does not explicitly reason on the belief state.
Figure 6 is an interesting case in which the handle of the jug is not visible from the query point cloud and the object looks almost symmetric, which yields the meanshift algorithm to a wrong pose estimation of about 180 degrees in the orientation. Nevertheless, IR3ne grasped in a single iteration.
These empirical results have shown the validity of the sequential replanning approach on a real scenario, however there are several issues that have to be addressed. First, the need for a predictive model for predicting how the target object moves after a contact occurs. Although the robot reliably stops after a contact is made, light objects, such as the plastic jug used in these trials, may be perturbed by the contact. For example, often we observe that the jug is tilted by a finger. The belief update by itself cannot deal with these cases and the resulting mean estimations are usually incorrect. However, the choice of point cloud models negatively affects the ability to develop predictive models. Second, the lack of tactile sensors does not allow us to have a good estimate of which link of the finger experienced the contact. This results in an additional uncertainty in the belief update.
Future work aims to address such issues by extending the sequential replanning approaches to the use of tactile sensors and simple heuristics for an objectindependent predictive model. The latter should depend on simple geometric properties of nonpenetration between the object and the robot’s fingers, so as to constrain the belief update and the mean pose estimate.
6 Conclusion
This paper has presented an approach for active tactile grasping that covers all stages: object modelling, state estimation from both vision and touch, grasp planning, replanning, and execution. We have presented and tested a novel algorithm for planning reachtograsp trajectories under object pose uncertainty that actively rewards (tactile) information gathering.
The main contribution of this work is to exploit active information gathering to the problem of dexterous robot grasping under nonGaussian 6D uncertainty. We do so by embedding expected information value directly in the physical space. This results in an efficient planning algorithm that simultaneously gathers information while grasping. We have also shown that our approach i) does not require a mesh model of the object to be grasped and ii) is not limited to grasp convex objects. We used a point cloud object model (PCOM) to represent the objects and presented i) a tactile observation model for PCOMs and ii) a pointcloud based collision detection for nonconvex objects that can be efficiently integrated into the planner.
Experiments have shown that our approach (IR3ne) is effective in modifying reachtograsp trajectories to maximise the expected information gain while bringing the hand to a grasp configuration which is likely to succeed. In line with results in literature, we confirm that sequential replanning achieves success rates which are higher than the success rates achieved with simple grasp attempts. Furthermore, IR3ne proves to require fewer replanning iterations than conventional planning methods and improve up to 50% the proportion of grasps that succeed on a first attempt. In contrast to conventional belief state planning algorithms, the experimental results confirm that our approach does not need to localise the object above a threshold of confidence to converge to a grasp.
The choice of a PCOM limits predictions of the object motion consequent to a contact. Even a light contact might move the object to a potentially unstable configuration. Future work will relax the assumption that tactile contacts have no physical effects. In addition, the incompleteness of the object model might have an effect on the detection of collisions during planning, such as generating trajectories which pass through the real target object. The current implementation of the algorithms interprets such contacts as unexpected observations which trigger replanning. As future work, we have the intent to extend the planner to account for the likelihood that the unexpected contact was due to erroneous or incomplete shape information. Ideally, this would result in a SLAM problem, i.e., simultaneously localise an object while also mapping its shape, during iterative reachtograsp trajectories.
References
 Bai et al. (2010) Bai, H., Hsu, D., Lee, W.S., Ngo, V.A.: (2010). Monte carlo value iteration for continuousstate pomdps. In: Algorithmic foundations of robotics IX, pp. 175–191. Springer
 Betts (2001) Betts, J.: (2001). Pratical methods for optimal control using nonlinear programming. Siam
 Bracci et al. (2017) Bracci, F., Hillenbrand, U., Márton, Z., Wilkinson, M.H.F.: (2017). On the use of the tree structure of depth levels for comparing 3d object views. In: Proceedings International Conference on Computer Analysis of Images and Patterns
 Brooks et al. (2006) Brooks, A., Makarenko, A., Williams, S., DurrantWhyte, H.: (2006). Parametric pomdps for planning in continuous state spaces. Robotics and Autonomous Systems 54(11), 887–897
 Christopoulos and Schrater (2009) Christopoulos, V., Schrater, P.: (2009). Grasping objects with environmentally induced position uncertainty. In: PLoS Computational Biology, vol. 5
 Detry (2010) Detry, R.: (2010). Learning of multidimensional, multimodal features for robotic grasping. Ph.D. thesis, University of Liege
 Gu et al. (2017) Gu, H., Zhang, Y., Fan, S., Jin, M., Liu, H.: (2017). Grasp configurations optimization of dexterous robotic hand based on haptic exploration information. International Journal of Humanoid Robotics 14
 Hillenbrand (2008) Hillenbrand, U.: (2008). Pose clustering from stereo data. In: VISAPP Int. Workshop on Robotic Perception
 Hillenbrand and Fuchs (2011) Hillenbrand, U., Fuchs, A.: (2011). An experimental study of four variants of pose clustering from dense range data. In: Computer Vision and Image Understanding
 Hsiao et al. (2011a) Hsiao, K., Ciocarlie, M., Brook, P.: (2011a). Bayesian grasp planning. In: IEEE Proc. Workshop on Mobile Manipulation: Integrating Perception and Manipulation, Int. Conf. on Robotic and Automation (ICRA)
 Hsiao et al. (2011b) Hsiao, K., Kaelbling, L.P., LozanoPérez, T.: (2011b). Robust grasping under object pose uncertainty. Autonomous Robots 31(23), 253–268
 Jacobson and Mayne (1970) Jacobson, D., Mayne, D.: (1970). Differential dynamic programming. Elsevier
 Kavraki and Svestka (1996) Kavraki, L., Svestka, P.: (1996). Probabilistic roadmaps for path planning in highdimensional configuration spaces. In: IEEE Trans. on Robotics and Automation
 Kopicki et al. (2015) Kopicki, M., Detry, R., Adjigble, M., Stolkin, R., Leonardis, A., Wyatt, J.: (2015). One shot learning and generation of dexterous grasps for novel objects. The International Journal of Robotics Research
 Kopicki et al. (2014) Kopicki, M., Detry, R., Schmidt, F., Borst, C., Stolkin, R., Wyatt, J.: (2014). Learning dexterous grasps that generalise to novel objects by combining hand and contact models. In: IEEE Proc. Int. Conf. on Robotic and Automation (ICRA)
 Kording and Wolpert (2004) Kording, K.P., Wolpert, D.M.: (2004). Probabilistic inference in human sensorimotor processing. Journal of Neurophysiology 92, 3161–3165
 LaValle (1998) LaValle, S.M.: (1998). Rapidlyexploring random trees: A new tool for path planning. Tech. rep., Computer Science Dept, Iowa State University
 Littman (1996) Littman, M.: (1996). Algorithms for sequential decision marking. Ph.D. thesis, Department of Computer Science, Brown University

Muja and Lowe (2014)
Muja, M., Lowe, D.: (2014).
Scalable nearest neighbor algorithms for high dimensional data.
Pattern Analysis and Machine Intelligence (PAMI) 36, 2227–2240  Murphy (2000) Murphy, K.P.: (2000). A survey of pomdp solution techniques. environment 2, X3
 Nikandrova et al. (2013) Nikandrova, E., Laaksonen, J., Kyrki, V.: (2013). Towards informative sensorbased grasp planning. Robotics and Autonomous Systems pp. 340–354
 Petrovskaya and Khatib (2011) Petrovskaya, A., Khatib, O.: (2011). Global localization of objects via touch. IEEE Trans. on Robotics 27(3), 569–585
 Platt et al. (2001) Platt, R., Kaelbling, L., LozanoPerez, T., Tedrake, R.: (2001). Simultanious localization and grasping as a belief space control problem. Tech. rep., CSAIL, MIT
 Platt et al. (2011) Platt, R., Kaelbling, L., LozanoPerez, T., Tedrake, R.: (2011). Efficient planning in nongaussian belief spaces and its application to robot grasping. In: Proc. of the Int. Symposium on Robotics Research
 Platt et al. (2012) Platt, R., Kaelbling, L., LozanoPerez, T., Tedrake, R.: (2012). Nongaussian belief space planning: Correctness and complexity. In: IEEE Proc. Int. Conf. on Robotic and Automation (ICRA)

Porta et al. (2006)
Porta, J.M., Vlassis, N., Spaan, M.T., Poupart, P.: (2006).
Pointbased value iteration for continuous pomdps.
Journal of Machine Learning Research
7(Nov), 2329–2367  Rosell et al. (2011) Rosell, J., Suarez, R., Rosales, C., Perez, A.: (2011). Autonomous motion planning of a handarm robotic system based on captured humanlike hand postures. Autonomous Robots 31(1), 87
 Sahbani et al. (2012) Sahbani, A., ElKhouryc, S., Bidauda, P.: (2012). An overview of 3d object grasp synthesis algorithms. Robotics and Autonomous Systems 60(3), 326–336
 Saxena et al. (2008) Saxena, A., Driemeyer, J., Kearns, J., Ng, A.: (2008). Robotic grasping of novel objects using vision. Int. Journal of Robotics Research 27(2), 157–173
 Silverman (1986) Silverman, B.: (1986). Density Estimation for Statistics and Data An. Chapman & Hall/CRC
 Tuzel et al. (2005) Tuzel, O., Subbarao, R., Meer, P.: (2005). Simultaneous multiple 3d motion estimation via mode finding on lie groups. In: Int. Conf. Computer Vision
 Weller (2013) Weller, R.: (2013). A Brief Overview of Collision Detection, chap. 2. Springer International Publishing
 Zhu and Latombe (1991) Zhu, D., Latombe, J.: (1991). New heuristic algorithms for efficient hierarchical path planning. IEEE Trans. Robotics and Automation 7, 9–20