I Introduction
Robot grasping of unknown objects is an important problem and an essential component of various applications, such as robot object packing [48, 47] and dexterous manipulation [4, 54]. Earlier methods [21, 11, 30, 13] could generate grasp poses for an arbitrary gripper or target object, but they ignored the uncertainty of real worlds. Recent learningbased methods [29, 49, 5, 40, 33, 10, 26] have demonstrated improved robustness in terms of handling sensor noise. Instead of directly inferring the grasp poses, these methods propose learning various intermediary information such as grasp quality measures [29] or reconstructed 3D object shapes [49] and then uses this information to help inferring grasp poses. On the positive side, it has been shown that learning this kind of information can improve both the dataefficacy of training and success rate of predicted grasp poses. On the downside, however, this intermediary information complicates the training procedure, hyperparameter search, and data preparation [49].
Ideally, a learningbased grasp planner should infer the grasp poses directly from raw sensor inputs such as RGBD images. Such approaches have been developed by many researchers [39, 22]. However, recent methods [29, 5] show that it is preferable to first learn a grasp quality metric function and then optimize the metric at runtime for an unknown target object using samplingbased optimization algorithms, such as multiarmed bandits [28]. Such optimization can be very efficient for lowDOF parallel jaw grippers but less efficient for highDOF anthropomorphic grippers due to their highdimensional configuration spaces. In addition, it is possible for the sampling algorithm to generate samples at any point in the configuration space and the learned metric function has to return accurate values for all these samples. To achieve high accuracy, a large amount of training data is needed, such as 6.7 million ground truth grasps in the dataset used by [29].
Various techniques have been proposed to improve the robustness and efficiency of grasp planners training. Prior works [10, 49] proposed to improve the dataefficiency of training by having the neural network recover the 3D volumetric representation of the target object from 2D observations. A 2Dto3D reconstruction subtask allows the model to learn intrinsic features about the object. However, a volumetric representation also incurs higher computational and memory cost. In addition, compared with surface meshes, volumetric representations based on signed distance fields cannot resolve delicate, thin features of complex objects [26]. As an alternative method, prior works in [14, 33] show that higher robustness can also be achieved using adversarial training, which in turn introduces additional subtasks of training and requires new data.
Main Results: We present a differentiable theory of grasp planning, extending ideas from [9], an early attempt to formulate grasp planning as a continuous optimization. Our main contribution is a generalized definition of the grasp quality metric that is defined when the gripper is not in contact with the target object. We show that this metric function is locally differentiable, and that its gradient can be computed from the sensitivity analysis of the optimality condition in a similar manner as [2]. We also propose a loss function to ensure that grasps are (self)collisionfree in a differentiable manner, which can be computed from only surface meshes of target objects.
Our method can be used as a locally optimal grasp planner similar to simulated annealing [30], but our method is guided by analytic gradients and can quickly find a locally optimal solution. More importantly, our method can be used to improve the quality of learned grasp poses using a simple neural network architecture. Specifically, we use a network that takes as input a set of multiview depth images of the target object and directly predicts a grasp pose for a highDOF gripper. This design choice is preferable to prior work [28] because it leads to a higher performance during runtime, as there is no need to optimize a learned grasp metric and we can obtain the grasp pose by a single forward propagation through the neural network.
By adding our differentiable loss, we show that the simple neural network architecture can predict highquality grasps for the Shadow Hand (fig:results) after training on a dataset of only 400 objects and 40K ground truth grasps. When compared with the supervisedlearning baseline
[26], our method achieves higher success rate on physical hardware and higher value in grasp quality metric [15]. Our learning architecture is illustrated in fig:architecture.Ii Related Work
In this section, we review related works in grasp planning using either modelbased or learningbased methods.
Modelbased Grasp Planners assume perfect sensing about geometries of the environment and shapes of target objects. Given the geometric information, a grasp planner searches for a grasp pose that maximizes a certain grasp quality metric; many techniques have been proposed for defining reasonable grasp quality metrics [50, 15, 41, 37] and designing efficient search algorithms [13, 15, 11, 30]
. These methods can be applied to both low and highDOF grippers and can be classified into discrete samplingbased techniques
[30] and continuous optimization techniques [9]. Samplingbased methods allow virtually any grasp quality metric to be used as the objective function, while continuous methods require the metric to be differentiable with respective to the configuration of the gripper. In practice, continuous optimization techniques are more efficient in terms of finding the (locally) optimal grasp poses.Some planning methods [15, 51, 52, 17] only compute optimal grasp points, while others [13, 27] compute both the grasp points and the gripper poses. When a gripper pose is needed, the planner uses a twostage approach: a set of grasp points is first selected on the surface of the target object and then the pose of the gripper is found by inverse kinematics. Based on the idea of numerically optimizing the grasp quality metric, we extend the definition of a grasp quality metric to be welldefined in the ambient space, i.e. when the gripper is not in contact with the target object, thereby unifying grasp points selection and gripper pose computation.
LearningBased Grasp Planners can predict grasp points or gripper poses given noisy observations of the environment. Most early works [39, 38] in this direction assume that a paralleljaw gripper is designed for the target object and that the input is a single depth image of the target object. In this case the grasp problem boils down to that of selecting the gripper’s initial direction and orientation, which can be solved using an analytic method [22]. However, the noteworthy success in this problem is achieved by DexNet [28, 29]
, which uses deep convolutional neural networks to learn object similarity functions and grasp quality functions. DexNet can robustly pick a large number of unknown objects, and this is achieved using a dataset of tens of thousands of target objects and millions of ground truth grasp poses.
More recent techniques aim at improving the data efficiency of learningbased planners and also making the planner robust in challenging settings involving arbitrary approaching directions [14], more general gripper types [10, 27], and model discrepancies [20, 45]. It has been shown in [14], among others, that the grasp planning task can be divided into two subtasks, object reconstruction and gripper pose prediction, and learning these two subtasks can improve the rate of success. [49] showed that adversarial training can also improve the robustness of the learned model. However, these methods either perform extensive data generation or require delicate parameter tuning for the adversarial training.
A common drawback of prior works [28, 29, 14] is that they learn a grasp quality metric function or grasping success predictor, which requires an additional samplingbased optimizer to search for gripper poses. This requirement limits these methods to lowDOF grippers, since the highdimensional configuration space of highDOF grippers makes the samplingbased optimization computationally costly. Some recent methods [26] overcame this difficulty by directly predicting a nominal gripper pose from an observation of the object. However, the predicted gripper pose is not directly usable and needs to be postprocessed. In comparison, our method predicts robust, usable gripper poses using a simple neuralnetwork architecture and uses a rather smaller dataset for training. In addition, our method can be combined with previous learningbased methods to improve their results.
As an alternative to supervised learning, reinforcement learning allows a learned grasp planner to discover useful grasp poses by exploration. Learned grasp planners have been successfully applied to grasping
[35] and other manipulation problems [56]. However, the number of state transition data needed in an typical training is on the level of millions [35], while we show that robust gripper poses can be predicted by supervised learning on a dataset with 400 example objects and using 40K ground truth grasp poses.Iii Learning Grasp Poses for HighDOF Grippers
Our goal is to learn a grasp prediction network from multiple depth images of the target object, where is the depth image taken from the th camera view toward the target object to be grasped and is the learnable parameters. The output of is both the 6D extrinsic parameters and joint angles of the gripper, i.e. . This is in contrast with prior works [14, 29], where another grasp quality metric function or grasp successful predicate function is learned and is a candidate grasp pose. Afterwards, the grasp pose is found by maximizing at runtime using samplingbased algorithms such as multiarm bandits [28].
However, when the gripper is highDOF, the maximization of becomes a search in a highDOF configuration space which is timeconsuming. As a result, we choose to learn instead of . The major challenge in learning is to resolve the ambiguity in grasp poses, because infinitely many grasp poses can have the same grasp quality for a target object but our neural network can only predict one pose. In order to resolve this ambiguity in grasp poses, we need the dataset to be consistent. A consistent grasp pose dataset is one where all the ground truth grasp poses can be represented by a single neural network. To enforce consistency, prior work [26] attempts to train by precomputing multiple grasp poses for each target object and use a Chamfer loss to have pick the most consistent pose. However, the learned gripper poses cannot be used directly due to its lowquality and a postprocessing is needed for deploying the learned poses on physical hardwares.
We aim at further improving the quality of the learned function without increasing the complexity of training in terms of either the amount of data or the network architecture. Instead, we are inspired by the early works [9, 15, 13], which formulate grasp planning as a continuous optimization. We incorporate all the criteria of good grasps as additional loss functions in terms of stochastic optimization. It has recently been shown that gradients can be brought through complex numerical algorithms and provide additional guidance. These domainspecific differentiable models [2, 18, 19] can significantly improve the convergence rate of neuralnetwork training and reduce the amount of data needed. However, we need to overcome several difficulties when using these approaches for grasp planning:

All the existing grasp quality metrics have discontinuities [53], so we have to modify them for differentiability.

A grasp quality metric is only defined when the gripper and the target object have exact contact, which is generally not the case when gripper poses are being stochastically updated by the training algorithm.

Our differentiable loss function is defined for a target object represented using triangle meshes. These triangle meshes come from wellknown 3D shape datasets [7, 42, 24, 23], some of which are of low quality. If gradient computation becomes unreliable on lowquality meshes (with nearly degenerate triangles), training will be misled.
We present our design of loss functions and discuss how to address the three challenging problems in the next section.
Iv Differentiable Grasp Planner
Our loss function is comprised of three terms: , , and . The first term is a generalized grasp metric [15]
that measures the quality of a grasp using physicbased rules. However, when force closure is not satisfied, both the metric value and its gradients are zero. In this degenerate case, we add a second, heuristic term
that always provides a nonvanishing gradient. Our third term penalizes both selfcollision and collisions between the gripper and the target object.Iva Notation
Throughout the paper, we assume that a target object is defined by a watertight triangle mesh . As illustrated in fig:illusQ, given a point in the workspace, we can also define the signed distance to as and the outward normal with respect to as . In addition, we also define as the gripper normal, i.e. the outward normal direction on the gripper mesh. We further assume that the target object’s centerofmass coincides with the origin of the Cartesian coordinates. During grasping, the object will be under an external wrench .
For a set of grasp points satisfying , with respective grasping forces , the quality of a grasp pose is defined by the metric [15] as follows:
(1)  
where is the frictional coefficient and
is the userprovided metric tensor that is equal to
. Intuitively, is the radius of the origincentered 6D sphere in the admissible wrench space, where an admissible wrench should satisfy two conditions: limited force magnitude and frictional cone constraints.IvB Generalized Metric with Inexact Contacts
In practice, it is infeasible to assume that a grasp metric can be computed in its original form, i.e. eq:Q1. This is because a learning system will generally not produce grasping points that lie exactly on the surface of the target object. It is well known that incorporating hard constraints into neural networks is difficult [36]. When a stochastic training scheme is used and neural network parameters are randomly perturbed, exact constraint satisfaction will be lost. As a result, we have to deal with cases where . Taking these cases into account, we derive a generalized version of by modifying the first condition of admissible wrenches in eq:Q1 as follows:
(2) 
which essentially extends to the ambient space by an exponential weight function with two terms. The first term ensures that our generalized attains larger values when grasp points are closer to the surface of the target object. The second term ensures that our generalized attains larger values when the normal direction on the gripper and the normal direction on the target object align. Finally, it is obvious that eq:Q1EXT converges to eq:Q1 as . Like previous works [44, 32] on generalized contactimplicit models, our generalized metric allows a learning algorithm to determine the number of contact points and their positions.
To train neural networks using the generalized metric, we need to compute its subgradient with respect to efficiently. Unfortunately, the exact computation of the metric is difficult because the optimization in eq:Q1 is nonconvex; several approximations have been proposed in [41, 13, 53]. We present two different techniques for computing and . The first method computes an upper bound of generalized
, which is cheaper to compute but creates zero entries in the gradient vector. The second method computes a smooth, lower bound of generalized
, which propagates nonzero gradient information but is more costly to compute.IvB1 Derivatives of the Upper Bound
Our first technique adopts [41] which approximates by assuming that must be along one of a discrete set of directions: . This assumption results in a tractable upper bound of and can be extend to our generalized metric as follows:
(3) 
which is a minmax optimization. Here the minimization is with respect to a set of discrete indices, for which subgradients can be computed. The maximization aims at finding the support of in and its optimal solution can be derived in a closed form. To show this, we first define the convex wrench space of each contact point as:
Then it is easy to verify that and the support of union of convex hulls is the maximum support of each hull, i.e.:
Finally, the support of in can be computed analytically as follows:
In this form, each operation for computing our generalized can be implemented as a standard math operation with derivatives that can be computed using automatic differentiation tools such as [34].
IvB2 Derivatives of the Lower Bound
We have shown that computing an upper bound of reduces to a series of simple operations with welldefined subgradients. However, due to the function in the computation of upper bound, the subgradient is nonzero only for one of the contact points, which is less efficient for training. To resolve this problem, it has been shown in [13] that sumofsquares (SOS) optimization can be used to compute a lower bound of . This theory can be extended to compute our generalized metric. If we define as a set of directions on the tangent plane, then the generalized can be found by solving the following SOS optimization problem:
(4)  
where we have extended the definition of
to account for our generalization (eq:Q1EXT). eq:Q1L can be reduced to a semidefinite programming (SDP) problem and its gradients can be computed via the chain rule:
While the second term in the chain rule above can be computed directly via automatic differentiation, the first term requires a sensitivity analysis of an SDP problem, as shown in [31] (see supplementary material for more details). Since SDP is a smooth approximation of nonsmooth optimization, the derivatives are generally nonzero on all the contact points. As a result, each neural network update can adjust all the fingers of the gripper to generate better grasp poses, which is more efficient than the case with upper bound on . On the other hand, the cost of solving eq:Q1L is also higher than that of solving eq:Q1U because eq:Q1L involves an SDP solve. Note that a similar analysis for quadratic programming (QP) problems has been previously exploited for training neural networks in [2].
IvC Geometry Related Loss Functions
In this section, we show that the geometric terms such as can be computed robustly from a triangle mesh. We also formulate the collisionfree requirement as a novel loss term. Geometric terms arise in many places in a grasping system. To compute the metric, we need to evaluate and . In addition, we need to avoid penetrations between grippers and the target objects. To perform these computations, we can introduce a monotonic loss function:
where is the weight of loss. To provide subgradients for all these terms, we need to plug into the chain rule. In this section, we show a robust method to compute for complex, watertight, triangle meshes of the target objects, which can be accelerated with the help of a bounding volume hierarchy (BVH). Note that it is easy to compute and its gradients from a signed distance field (SDF) [3], but we choose to use triangle meshes for two reasons. First, most existing 3D shape datasets, such as [7, 8, 55], use triangle meshes, and converting them to SDFs is time and memory consuming. Second, for very complex meshes, lowresolution SDFs cannot represent thin geometric features and determining an appropriate resolution of SDF is difficult.
Let’s assume that a triangle mesh consists of a set of triangles . Then the distance between and is the solution of the following QP problem:
where is the th vertex of . Finally, the signed distance is defined as:
(5)  
where is the outward normal of and is the sign function. Similarly, we can define the outward normal of to be:
(6) 
In these formulations, the sign function and the operator define a disjoint convex set with welldefined subgradients. The gradient of is:
And the gradient of can be computed from the dirichlet features on the triangle mesh to which belongs, as illustrated in fig:hessian. If the closest feature to is an edge , then we have:
If the closest feature to is a vertex , then we have:
If the closest feature to is inside a triangle, then .
Finally, eq:SDIST and eq:SNOR involve a loop over all triangles to find the one with smallest distance, which can be accelerated by building a BVH and quickly rejecting nodes where the bounding volume is further from than the current best distance [1].
In our experiments, the technique described above is computationally efficient but prone to floatingpoint’s truncation error. If a point is close to the triangle’s plane, finiteprecision floating point arithmetics have difficulty deciding whether the point lies inside the triangle mesh or not. To solve this problem, we use exact rational arithmetics implemented in
[16] to perform all the computations in this section and convert the results back to inexact, finite precision floating point numbers at the end of the computation.IvD SelfCollision of the Gripper
To prevent gripperobject collisions, we add a term to penalize any collisions between different links of the gripper. Assuming the gripper has links, we first approximate the shape of each link using a convex hull and define as:
which can be trivially computed from Hrepresentations of and can be accelerated using a bounding volume hierarchy. In practice, we use a small set of sample points to compute the generalized metric and another large set of sample points to compute to achieve better resolution of self collisions.
IvE Defending Against Degenerate Cases and Local Minima
Our generalized metric is similar to the standard metric in that it implies force closure. However, if an initial guess for the gripper pose has no force closure, then
and no gradient information is available. In this case, we add the following heuristic term to guide the optimization to compute a forceclosed pose with a high probability:
by ensuring that all the grasp points are as close to the object as possible. In addition, our generalized has many local minima due to nonlinearity and complex geometries of objects. To defend our neural network against these suboptimal solutions, we add a data loss to guide the training. We use Chamfer loss for our data term:
following previous works [10, 26], where is the ground truth grasp pose, is the Chamfer distance measure in the gripper’s configuration space. In other words, we precompute many ground truth grasp poses for each target object and let the neural network pick the grasp pose that leads to the minimal distance.
IvF Forward Kinematics
Our neural network predicts , which consists of the global rigid transformation and the joint angles to define the pose of a gripper. Further, the gradient with respect to the grasp points is propagated backward to via a forward kinematics layer denoted as , similar to [26, 46]. We make a minor modification to account for joint limits with nonvanishing gradients. If has joint limits in range , then we transform as follows:
which is guaranteed to satisfy the constraints and has nonvanishing gradients compared with the functions.
In summary, our learning system uses the following compound loss function:
where are various weights.
V Experimental Setup
Data Preparation: Following [10], we prepare a small dataset of 500 watertight objects by combining existing grasping datasets [7, 42, 24, 23]; We split the dataset into an (400) training set and a (100) test set. It is known that predicting a single grasp pose from a single target object is an ambiguous problem because many grasp poses are equally effective [26]. Therefore, we use [30] to precompute a set of grasp poses for each target object and then use Chamfer data loss to let the neural network pick which grasp pose is the most representable. This gives a dataset of K grasps, from which our neural network will select as ground truth. For our 24DOF gripper, collecting these data requires about 150 CPU hours of computation on a cluster using a samplingbased grasp planner [30]. Finally, we assume that the neural network observes objects from a set of multiview depth cameras of resolution . These images are obtained by rendering the triangle mesh of each target object into the depth channel. As a result, each sample in our dataset is a tuple of depth images, triangle mesh, and ground truth grasp poses. After collecting our dataset, we augment it by rotating each target object and gripper for 8 times along 8 symmetric axes.
Gripper Setup: In all our simulated and realworld experiments, we use a (6+18)DOF Shadow Hand as our gripper as shown in fig:gripper, which is mounted onto a UR10 arm. However, during the training phase, the DOFs of the arm are not predicted by our neural network. These DOFs are computed at runtime using a conventional motion planner. We use the SrArmCommander [12] to move the UR10 arm to the target poses and use the SrHandCommander [12] to move the Shadow Hand fingers to the target joint states. During the training phase, we manually label =45 potential grasp points on the gripper and, to detect selfcollisions, we use a denser sample of 15,555 potential contact points using Poisson disk sampling, as illustrated in fig:gripper.
Neural Network:
We deploy ResNet50 as a feature extractor for multiview depth images. For each depth image, we duplicate it to 3 channel to meet the input requirement of ResNet50. A shared ResNet50 takes multiview depth images as input and outputs 2,048 dimensional vectors. These vectors are used with maxpooling and connected with a fullyconnected layer, of which the output dimension is equal to gripper’s DOF (6+18 for Shadow Hand). Outputs of fullyconnected layer are the predicted gripper configurations.
Training configurations: We use the parameters listed in table:param for in both settings. Our neural network is trained using the ADAM algorithm [25] with a batch size of 16. The initial learning rate is set to be 1
4 and decayed by 0.9 every 20 epochs. All experiments are carried out on a desktop with 2 Intel
Xeon Silver 4208 CPUs, 32 GB RAM, and 2 NVIDIA RTX 2080 GPUs.Parameters  m  
Value  6.0  8.0  0.7  0.001  64  1.0  1.0  0.1  1.0 
Vi Experimental Results
In this section, we evaluate the performance of different settings for highDOF grasp planning. Our method can be used either as a standalone grasp planner or as a method to train grasp predicting neural networks.
Via Grasp Planning Without Ground Truth
The differentiable grasp metric and collision loss allows our method to be used as a standalone, locally optimal grasp planner. To setup this experiment, we replace the neural network with a DOF optimizable vector of gripper pose, set , and minimize with respect to . As compared with [30], our planner only provides locally optimal and the computational cost is comparable. An example is illustrated in fig:hand_mode_more, where we use a trivial initialization shown as the transparent green poses and after minutes of optimization, our optimizer converges to the gray poses. However, without guidance of data, our planner can fall into local minima without force closure () as shown in fig:hand_mode.
ViB Learning Grasp Poses With Ground Truth
In our second benchmark, we use our method to guide the training of a graspposepredicting neural network. The training is performed in two phases. First, we adopt a pretraining by setting , i.e. excluding our differentiable loss. This step brings the neural network close to nearly optimal values and we run 35 epochs of learning at the first stage. Second, we finetune the network by adding our differentiable loss and use weights summarized in table:param. We run 71 epochs of learning at the second stage. After training, a set of predicted grasp poses on the test set are shown in fig:postprocess, which shows that the quality of grasp is drastically improved when guided by the data term. The pretraining takes 4 hours and the finetuning takes 36 hours. On average, each forwardbackward propagation with our additional loss function takes 0.85s and that without our loss function takes 0.61s, which shows that our additional loss functions only impose marginal cost to gradient computation.
ViC Comparison
We have compared the (standard) metric [15] of our method and that generated using samplingbased grasp planner [30] in fig:NN_mode_more. The results show that the qualities of our grasp poses are on par with those of [30]. We have also compared with prior work [26], which also trains a grasppose predicting neural network on a small dataset of a similar size to ours. However, the algorithm in [26] requires a postprocessing to resolve penetrations and collisions. Instead, the grasp poses predicted using our method can be directly deployed onto a physical hardware without postprocessing. As illustrated in fig:postprocess and table:comp_liu, our method can significantly improve the quality of grasp poses.
Method  Metric  Penetration  Success Plan  Success Grasp 

Ours  0.23  3.5mm  38  33 
[26]  0.11  14.2mm  35  27 
ViD Grasping With the Arm on Physical Hardware
As our final evaluation, we deploy our learned neural network onto our physical platform. Our method does not require RGB input and only uses the depth channel. Therefore, we do not perform any simtoreal transfer. Our neural network only predicts the gripper pose and does not predict the configuration for the UR10 arm to achieve the predicted position and orientation. These configurations of the arm are computed using a motion planner at runtime. We choose 50 YCB objects from our 100 test objects. Our depth cameras are calibrated beforehand to make the camera pose exactly same as the poses used for training.
To profile the rate of success on the 50 YCB objects, we use two metrics summarized in table:comp_liu. First, we record how many times the motion planner can successfully move the gripper to the predicted position (Success Plan). This metric measures the ability of our method in avoiding penetrations and collisions with the desk that objects are put on, since a pose with penetrations or desk collisions cannot be achieved by a motion planner. Second, we record how many times the grasp planner can successfully lift the object (Success Grasp). This metric measures the ability of our method in improving the grasp quality. Our method outperforms [26] in terms of both metrices. We observe a improvement in terms of Success Plan, and of improvement in terms of Success Grasp. Our neural network failed on the objects due to slippage.
Vii Conclusion and limitations
We present a differentiable grasp planner that enables a neural network to be trained with a small dataset and a simplified architecture. Our differentiable loss accounts for various requirements for a good grasp, including high grasp metric values and collisionfree gripper poses. We use a generalized definition to allow inexact contact and we show that the subgradients of each loss term are welldefined and can be efficiently computed from target object shapes represented using watertight triangle meshes. We show that our method can be used both as a standalone grasp planner and as a neural network training algorithm. Finally, we show that the trained neural network performs robustly on unseen objects and hardware platforms.
Our current implementation suffers from several limitations. First, our method requires the target objects be watertight and have a nonzero volume. Although we do not require a signed distance field transformation, our method still computes a signed value of distance, which is impossible when the target object is a thinshell. A limitation related to this problem is that our method suffers from tunneling. In other words, when the target object is very thin, a stochastic update of our neural network might result in the hand going from one side to the other side of the object, leading to missed solutions. In the future, this problem can be resolved using continuous collision detection [6].
Second, our experimental setup and neural network architecture prevents the neural network from predicting multiple grasp poses for a single object. If there are other constraints in the workspace preventing a grasp pose from being achieved, then our method will lead to failure. However, this problem can be resolved by using adversarial training similar to [14, 33], where a distribution of grasp poses is learned. We emphasize that more sophisticated learning algorithms are orthogonal to our approach and can be combined with it.
Viii Appendix
In this document, we provide some details on computing the lower bound and its derivatives. First, we derive a slightly different formulation of the lower bound using quadratic frictional cones. As compared with the linearized frictional cones used in [13], using quadratic frictional cones is more efficient in terms of reducing the problem size of semidefinite programming.
1 Q1 Lower Bound Using Quadratic Frictional Cone
We rederive the lower bound of using SOS optimization as done in [13], but using quadratic frictional cones. For a set of points , with normals and two tangents being , then the cones are defined as:
where we have:
It is easy to find that the dual cones of are defined as:
The induced SOS problem is:
(7) 
Note that eq:SOS will induce an SDP problem with exactly the same order (of polynomials) as the original SDP problem induced in [13], but with fewer cones and also smaller linear system when performing sensitivity analysis. Finally, we briefly prove the correctness of .
Lemma VIII.1
The dual cone of is .
Proof: If , then for any , we have:
For the other direction, if there is an such that , then we can pick a point in as follows:
such that:
2 SDP Sensitivity Analysis for Lower Bound of
In this section, we present an efficient way to perform sensitivity analysis for SOS problems. We use the same notations as those in [31]. A standard SDP takes the form:
where there are PSD cones in our problem ( equals the number of tangent directions if linearized frictional cones are used and if quadratic frictional cones are used). The dual variable to the th cone is and we have . We also define the coefficient matrix:
In an SOS problem, the first PSD cone and the other cones are of two different types. The first PSD cone specifies the conditional polynomial positivity condition. The other cones specify the positivity of Lagrangian multipliers. We also observe that some variables only affect the first PSD cone and other variables affect the other PSD cones. Therefore, we can write the matrix in a block form as follows:
where it is trivial to verify that we can choose variables to make the bottom right block of
an identity matrix. When SDP is solved using primaldual interior point method, the set of primal and dual solutions are computed simultaneously, with the dual variables defined as:
where we apply the same decomposition of cones for . Next, we apply the optimalty condition of SDP:
where is the symmetric kronecker product operator. Apply sensitivity analysis with respect to an arbitrary parameter , we have:
where:
In the following derivation, we assume that and have strict complementarity. Note that if strict complementarity is not satisfied, then the SDP problem is not differentiable. Prior work [31] showed that and have simultaneous diagonalization, and so does and :
By plugging these identities info the sensitivity equation, we get:
where there are 8 equations. For th rows, we have:
(8)  
By plugging eq:reduction into the st row, we have:
Comments
There are no comments yet.