Nonprehensile manipulation of objects is a practical skill used frequently by humans to displace objects from an initial configuration to a desired final one with a minimum effort. In robotics, this type of manipulation can be more advantageous than the traditional pick-and-place approach when an object cannot be easily grasped by the robot, due to the design of the end-effector and the size of the object, or the obstacles surrounding the manipulated object. For example, combined pushing and grasping actions have been shown to succeed where traditional grasp planners fail, and to work well under uncertainty [17, 15, 28, 27, 26, 42, 50].
. Recent techniques for planar sliding mechanics focus on learning data-driven models for predicting the motions of the pushed objects in simulation. While this problem can be solved to a certain extent by using generic end-to-end machine learning tools such as neural networks, model identification methods that are explicitly derived from the equations of motion are generally more efficient . A promising new direction is to directly differentiate the prediction error with respect to the model of the object’s mechanical properties, such as its mass and friction distributions, and to use standard gradient descent algorithms to search for values of the properties that reduce the gap between simulated motions and observed ones. Unfortunately, most popular physics engines do not natively provide the derivatives of the predicted poses [18, 10, 41, 6, 39], and the only way to differentiate them is through numerical finite differences, which are expensive computationally.
In the present work, we propose a method for safely sliding an unknown object from an initial configuration to a target one. The proposed approach integrates a model identification algorithm with a planner. To account for non-uniform surface properties and mass distributions, the object is modeled as a large set of small cuboids that may have different material properties and that are attached to each other with fixed rigid joints. A simulation error function is given as the distance between the centers of the objects in simulated trajectories and the true observed ones. The gradient of the simulation error is used to search for the object’s mass and friction distributions.
The main contribution is the derivation of the analytical gradient of the simulation error with respect to the mass and friction distributions using the proposed cuboid representation of objects. The second contribution is the use of the derived analytical gradients for identifying models of unknown objects by using a robotic manipulator, and demonstrating the computational and data efficiency of the proposed approach. The last contribution is the use of the proposed integrated method for pre-grasp sliding manipulation of thin unknown objects.
Ii Related Work
Algorithms for Model-based reinforcement learning
Model-based reinforcement learningexplicitly learn the unknown dynamics, often from scratch, and search for an optimal policy accordingly [14, 30, 47, 55, 1]. The unknown dynamics are often modeled using an off-the-shelf statistical learning algorithm, such as a Gaussian Process (GP) [45, 13, 7, 33, 3, 40], or a neural network [38, 8, 19, 20]. This approach was recently used to collect images of pushed objects and build models of their motions [43, 45, 37, 4]. While the proposed method belongs to the category of model-based RL, it differs from most related methods by the explicit use of the dynamics equations, which drastically improves its data-efficiency. The identified mass distribution can also be used to predict the balance and stability of the object in new configurations that are not covered in the training data. For instance, a GP or a neural net cannot predict if an object remains stable when pushed to the edge of a support table, unless such an example is included in the training data, with the risk dropping the object and losing it.
The mechanics of pushing was explored in several past works [34, 31, 32, 24, 16, 54, 51, 53, 49], from both a theoretical and algorithmic point of view. Notably, Mason  derived the voting theorem to predict the rotation and translation of an object pushed by a point contact. A strategy for stable pushing when objects remain in contact with an end-effector was also proposed in . Yoshikawa and Kurisu  proposed a regression method for identifying the support points of a pushed object by dividing the support surface into a grid and approximating the measured frictional force and torque as the sum of unknown frictional forces applied on the grid’s cells. A similar setup was considered in  with a constraint to ensure positive friction coefficients. The limit surface is a convex set of all friction forces and torques that can be applied on an object in quasi-static pushing. The limit surface is often approximated as an ellipsoid , or a higher-order convex polynomial [54, 51, 53]. An ellipsoid approximation was also used to simulate the motion of a pushed object to perform a push-grasp . In contrast with our method, these works identify only the friction parameters, and assume that the mass distribution is known or irrelevant in a quasi-static regime.
There has been a recent surge of interest in developing natively differentiable physics engines [12, 2]. A combination of a learned and a differentiable simulator was used to predict effects of actions on planar objects , and to learn fluid parameters . Differentiable physics simulations were also used for manipulation planning and tool use 
. Recently, it has been observed that a standard physical simulation, formulated as a Linear Complementary Problem (LCP), is also differentiable and can be implemented in PyTorch. In , a differentiable contact model was used to allow for optimization of several locomotion and manipulation tasks.
Iii Problem Setup and Notation
We consider the problem of displacing a rigid object on a flat homogeneous surface from an initial pose to a desired final pose . The object has an unknown shape and material properties. We assume that a depth-sensing camera provides a partial 3D view of the object. The partial view contains only the object’s upper surface. A 3D shape is then automatically constructed by assuming that the occluded bottom side is flat. Most of the objects used in our experiments are not flat. However, the autonomously learned friction model simply assigns near-zero friction coefficients to the regions of the bottom surface that do not actually touch the tabletop. Thus, learned near-zero friction forces compensate for wrongly presumed flat regions in the occluded bottom part of the object.
We approximate the object as a finite set of small cuboids. The object is divided into large number of connected cells , using a regular grid structure. Each cell has its own local mass and coefficient of friction that can be different from the other cells. The object’s pose at time
is a vector incorresponding to the translation and rotation in the plane for each of the cells. In other terms, , where is the cell’s 2D position on the surface, and is its angle of rotation. Similarly, we denote the object’s generalized velocity (a twist) at time by , where is the cell’s linear velocity on the surface, and is its angular velocity. The object’s mass matrix is a diagonal matrix, where the diagonal is ,
is the moment of inertia of thecell of the object, and is its mass. where is the width of a cuboid. is a diagonal matrix, where the diagonal is . is the coefficient of friction between the cell of the object and the support surface. We assume that: , where is a given upper bound. An external generalized force (a wrench) denoted by is an vector , where and are respectively the force and torque applied on cell . External forces are generated from the contact between the object and a fingertip of the robotic hand used to push the object. We assume that at any given time , at most one cell of the object is in contact with the fingertip. Therefore, where is the index of the contacted cell at time .
A ground-truth trajectory is a state-action sequence , wherein is the observed pose and velocity of the pushed object, and is the external force applied at time , as defined above. A corresponding simulated trajectory is obtained by starting at the same initial state in the corresponding real trajectory, i.e., , and applying the same control sequence . Thus, the simulated trajectory results in a state-action sequence , where is the predicted next pose. Velocity is a vector corresponding to translation and angular velocities in the plane for each of the cells, it is predicted in simulation as . The goal is to identify mass distribution and friction map that result in simulated trajectories that are as close as possible to the real observed ones. Therefore, the objective is to solve the following optimization problem,
Since is a vector containing all cells’ positions, the loss is the sum of distances between each cell’s ground-truth pose and its predicted pose, which is equivalent to the average distance (ADD) metric as proposed in . In the following, we explain how velocity function is computed.
Iv Forward Simulation
We adopt here the formulation presented in [9, 5, 11]. We adapt and customize the formulation to exploit the proposed grid-structure representation, and we extend it to include frictional forces between a pushed object and a support surface. The transition function is given as where is the duration of a constant short time-step. Velocity is a function of force and mechanical parameters and . To find , we solve the system of equations of motion that we present in Figure 2, where and are inputs, are slack variables, are unknown vectors, and are hypothesized mass and friction matrices. is a vector corresponding to the main diagonal of .
is a global Jacobian matrix of all the adjacency constraints in the grid structure. These constraints ensure that the different cells of the object move together with the same velocity. is an matrix where is the number of cells, and is the number of pairs of adjacent cells.
If cell , whose four sides have length , is one of the two adjacent cells in the pair indexed by , then
is a variable vector that is multiplied by the Jacobian to generate the vector of impulses , which are time-integrals of internal forces that preserve the rigid structure of the object.
is an Jacobian matrix related to the frictional forces between the object’s cells and the support surface, and the corresponding constraints. The main block-diagonal of is , wherein
and the remaining entries of are all zeros. is a variable vector. is multiplied by to generate a vector of the frictional forces and torques between the support surface and each cell of the object. defines the direction of the frictional forces and torques as the opposite of its current velocity , whereas defines the scalar magnitudes of the frictional forces and torques.
The friction terms have complementary constraints, stated in Fig. 2. These constraints are used to distinguish between the cases when the object is moving and friction magnitudes are equal to , and the case when the object is stationary and the friction magnitudes are smaller than . When the object moves, and assuming that the change in the direction of motion happens smoothly, we have . Therefore, because of the constraints and and . Then because of the constraint . We conclude that from the constraint . Similarly, one can show that if .
To simulate a trajectory , we iteratively find velocities by solving the equations in Fig. 2 where are fixed inputs and the remaining variables are unknown. The solution is obtained, after an initialization step, by iteratively minimizing the residuals from the equations in Fig. 2, using the convex optimizer of .
V Mass and Friction Gradients
To obtain material parameters
, a gradient descent on the loss function in Equation1 is performed. A first approach to compute the gradient is to use the Autograd library for automatic derivation in Python. We propose here a second simpler and faster approach based on deriving analytically the closed forms of the gradients and .
Let us denote by the solutions for in the system in Fig. 2. Let us also use to denote a matrix that contains a vector as a main diagonal and zeros elsewhere. Finally, let refer to the main diagonal of . In other terms, . Then,
The differentials of the system are given as
wherein , , , are all zero matrices and vectors because and are fixed and treated as a constant since they are set to and in Equation 1. Also, because the applied force at time is given as a constant in the identification phase. The differentials can be arranged in the following matrix form: ,
wherein is defined as . We use the blockwise matrix inversion to compute ,
where and are square matrices, and and are invertible. Notice that to compute , we only need the upper quarter of , because the remaining raws will be multiplied by . Consequently, terms and do not matter here, and we only need the terms and . The first term corresponds to
In the model identification phase, we only utilize data points where the object actually moves when pushed by the robot. Thus, and , and
Using the blockwise matrix inversion, we find that
We will see in the following that the remaining matrices, and , will not be needed. Similarly, the top right of is . It is given as
The first term in is then
and the fourth term is
Since , then
In the following, we show how to use the equation above to derive and and use them in a coordinate descent algorithm to identify from data.
V-a Mass Gradient
We calculate while setting .
From the definition of the loss function in Equation 1, we can see that , wherein is the observed ground-truth pose of the object and is its predicted pose, computed as . Finally,
V-B Friction Gradient
We calculate while setting .
V-C Mass and Friction Identification Algorithm
The gradient of the loss function with respect to the mass and friction matrices and are used in Algorithm 1 to search for the ground-truth mass and friction. We present here the stochastic version where the gradient is computed for each trajectory separately. The gradient can also be computed in a batch mode from all trajectories. of data collected by the robot. The mass and friction are increased or decreased depending on the signs in the error vector , which corresponds to the reality gap. The main computational bottleneck is in computing weights and , which is linear in the number of cells because is a diagonal matrix, and is block diagonal thanks to the regular grid structure of the cells. Finally, we project the updated mass and friction matrices by rounding their values down to upper limits . The provided upper limits are the same for every cell in the object, whereas the mass and friction distributions learned by the algorithm are highly heterogenous, as will be shown in the experiments.
Vi Policy Gradient
After identifying the mass distribution and friction map using Algorithm 1, we search for a new sequence of forces to push the object toward a desired terminal goal configuration . Algorithm 2 summarizes the main steps of this process. We start by creating a rapidly exploring random tree () to find the shortest path from to . While searching for the shortest path, we eliminate from the tree object poses that are unstable (based on the identified mass distribution ) or that are in collision with other objects. returns a set of waypoints . At each iteration of the main loop of the algorithm, we find the nearest waypoint in and search for actions that would push the object toward it. A pushing force is parameterized by a contact point, a direction and a magnitude, as discussed in Section III. We focus here on optimizing the contact point, and we keep the magnitude constant. The direction of the force is chosen to be always horizontal. It is given as the opposite of the surface normal of the object at the contact point, projected down on the 2D plane of the support surface. This choice is made to avoid slippages and changes in contact points during a push.
A contact point is always located on the outer side of a cuboid (cell). Therefore, we limit the search to the outer cells of the grid. The objective of this search is to select a contact point that reduces the gap between the predicted pose of the object after pushing it, and the nearest waypoint that has not been reached yet. We select the initial contact point as the outer cell that is most aligned to the axis , where
is the estimated center of mass. The gradient of the gap with respect to the contact point is computed by using the finite-difference method. The contact point is moved in the direction that minimizes the gap until a local optimum is reached. Forceis then defined based on the selected contact point. The pose and velocity of the object are replaced by the predicted ones that result from applying force . This process is repeated until the object reaches the desired goal configuration. The time duration of each pushing action is also optimized by using finite differences. This part is omitted for simplicity’s sake.
The surface of the object often contains non-differentiable parts where the analytical gradient with respect to the contact point is undefined. Even on smooth parts, there is no clear advantage of computing the analytical gradient here, because the space of contact points is uni-dimensional, in contrast to the high-dimensional space of non-uniform mass and friction distributions. In low-dimensional search spaces, finite-difference methods are computationally efficient.