I Introduction
Models of physical interaction in robotics are driven by experimental laws of friction and impact. These laws, such as Coulomb friction, describe the macroscopic behavior of contact by compounding variations at the microscopic level. As a result, one expects them to be accurate at most in a statistical sense. We are interested in learning datadriven empirical models that capture more accurately statistical physical interaction, including its expected variability.
Pushing is a simple manipulation task which already shows interesting statistical behavior. It is primitive to our ability to manipulate objects large and small, and is often involved in more complex manipulation behaviors such as grasping. In previous work [2] we provide empirical evidence of the variability in the outcome of a planar push. Figure 1 shows that a series of pushes (center), indistinguishable to sensor and actuator resolution, yields divergent outcomes, while a different set of pushes (left) yields a more convergent set of outcomes. Some pushes are more precise than others, and some yield multimodal behavior (right).
In this paper we learn a compact datadriven model that captures the first two moments, i.e., mean and variance, of the expected behavior of a pushed object. We expect that models like this can be the basis for more realistic simulation, can aid in the design of robust plans or control policies, and can yield more statistically sound inference. The approach, validation, and structure of this paper, are as follows:

Pushing Data: We rely on the dataset by Yu et al. [2] to learn and test the model. We contribute in Section III with an addendum to the dataset with repeated pushes in a grid of pushing locations and directions designed to validate the variances predicted by the model (the new dataset is available online [7]).

Model: The learned model is based on a family of Gaussian processes called Heteroscedastic Gaussian processes (HGPs), along with their stateoftheart variational implementation [1]. This model targets phenomena with inputdependent noise, i.e., when the amount of noise introduced by the system depends on the action. Section V
uses it to estimate the most likely outcome of a push and its variance.

Evaluation and comparisons: In Section VI
we evaluate the push predictions for four objects sliding on four materials. The accuracy is measured by the mean square prediction error (NMSE) and the normalized log probability density (NLPD), and compared to normal Gaussian processes and a common analytical model
[8]. 
Validation: Finally, in Section VII
we validate the predicted probability distributions based on the KLdivergence distance to ground truth estimates of the distribution from repeated pushes.
We finish in Section VIII with a discussion on contributions, limitations, and possible directions for future work.
Ii Related Work
There is significant excitement surrounding empirical datadriven techniques for robotic manipulation [9, 10]. Recently Huang et al. [11] surveyed efforts to create datasets of object manipulation. The highfidelity dataset on planar pushing interaction by Yu et al. [2] is specially relevant to this work. It contains recordings of pushing motions and forces for different dimensions of shape, material, pushing direction, location, velocity, and acceleration. It also provides empirical evidence of the variability of the pushing process, which is the basis of the learned models in this paper.
Over the years, several works have applied datadriven techniques to the problem of planar pushing [12, 13, 14, 15, 16]. Most recently, Zhou et al. [16] presented a datadriven but physicsinspired model for planar friction. The algorithm approximates the limit surface representation of the relationship between frictional loads and motion twists at a planar contact, and is the stateoftheart in dataefficient friction modeling in robotics. All these algorithms study the problem of controlling a pushed object in a datadriven fashion, but to our knowledge, no previous work has attempted to model both the expected behavior and the experimental variability.
In this paper we use a probabilistic model of the Gaussian processes family to predict the outcome of a push. Gaussian processes are used often to capture both mean and variance of a dynamic system. For example Paolini et al. [17] use Gaussian processes to learn both the transition dynamics and the observation model of prehensile manipulation tasks. In this paper we explicitly consider the dependence of the noise in the transition dynamics with the input action by using heteroscedastic Gaussian processes introduced by Le and Smola [18]. Kersting et al. [19] proposed a simplified learning algorithm based on maximum likelihood approach, which tends to underestimate noise levels. LazaroGredilla and Titsias [1] solve this problem by introducing a variational heteroscedastic Gaussian process algorithm which we use in this work.
Iii PusherSlider Data
This paper focuses on modeling the probabilistic behavior of the pusherslider system, illustrated in Figure 2. This section describes the data we use to learn and validate the model. We are interested in learning the behavior of the slider as an inputoutput relationship. As illustrated in Figure 3 the representation for the input space is:

Magnitude of the velocity of the pusher.

Contact point on the perimeter of the slider.

Pushing angle.
and the representation for the output space is:

COM x displacement in the pusher ref. frame.

COM y displacement in the pusher ref. frame.

Orientation change.
after a push for seconds. These parameters are sufficient to characterize simple models of planar point pushing [8, 6].
In this paper we use two sets of data: a largescale general purpose dataset of planar pushing for learning, and a dataset with repeated pushes we collected for the purpose of validating the results. In particular we validate the model prediction of the variance for the outcome of a push.
Iiia Learning Data
In previous work [2] we captured a large set of planar pushes with the robotic setup in Figure 2, composed of a highprecision industrial robot fitted with a cylindrical pusher and stainless steel objects of different shapes sliding on surfaces of different materials. The system records the trajectories of the pusher and object and the force at the interaction. We will show in later sections that 100 samples are sufficient in average to outperform analytical models, and model accuracy saturates with less than 1000 samples.
IiiB Validation Data
Our goal is to predict reliably the object’s expected motion and its variance:
(1)  
where is the input/action and and are the inputdependent expected outcome and variance. The dependence of the variance with the input is the key complexity we address in this paper, motivated by the example in Figure 1, and leads us to consider HGPs instead of standard GPs.
To validate the observation that the output noise depends on the input, we collected a new dataset in the same setup in Figure 2 containing 100 repetitions of each push considered, which gives us an approximate distribution of the object motion. In this new dataset, the pusher follows a straight trajectory of 1cm long at 20mm/s. The initial contact angles go from to radians spaced by 0.1 radians while we consider 11 different initial contact points evenly spaced on the side of the object. This produces a sufficiently dense grid of pushes allowing us to extract for each push the expected mean and variance of the object motion. This dataset is available online [7].
Besides validating the proposed model, the new dataset also provides empirical evidence to study the variability of the pushing process, including multimodality effects.
Iv Heteroscedastic Gaussian processes
Gaussian processes (GPs) are a flexible and commonly used framework in robotics. We find them in many different areas in robotics such as manipulation, motion planning or learning from demonstration [17, 20, 21]. In this paper, we apply GPs to the modeling of planar pushing.
Classical (homoscedastic) GPs [22] assume that the noise from the observations is Gaussian and constant over the input:
(2) 
where is an observation of the process at the input , is the latent function that we want to regress and represents Gaussian noise with variance .
The assumption of constant Gaussian noise together with a GP prior on the latent function makes analytical inference possible for GPs (IV). We consider that follows a GP prior where is the Automatic Relevance Determination Squared Exponential (ARDSE) kernel [22]. Given the training set , the probability of an observation at the input is given by:
(3)  
where is a matrix that evaluates the ARDSE kernel in the training points, ,
is a vector with
and is the value of the kernel at , . Finally, represents the vector of observations from the training, andis the set of hyperparameters including
and the kernel parameters that are optimized during the training process.The expected variance of an observation at the input comes from the addition of two variances: and (IV). While is constant and represents the process noise, depends on and is only related to the regression error (not to the real process from which the data originates). In practice, the term depends highly on the density of training points around the input and reflects how confident is the GPs regression on the mean value .
Assuming that the process noise is constant over the input space is sometimes too restricting. Allowing some input regions to be more noisy than others is beneficial for practical applications where there are naturally stable and unstable dynamics. As we motivated in the introduction, this is the case for pushing where depending on the type of push, the resulting motion can be convergent or divergent (Figure 1).
Some extensions of GPs can incorporate inputdependent noise. These algorithms are referred as heteroscedastic Gaussian processes (HGPs) and can regress both the mean of the process and its variance over the input space. Moreover, they also improve the mean estimation by giving more relevance to those observations obtained from less noisy inputs. In HGP regression, observations are drawn from:
(4) 
where now explicitly depends on .
In this paper, we use the state of art algorithm for HGPs called variational HGP (VHGP) proposed by LazaroGredilla and Titsias [1]. Using Bayesian variational theory they derive closedform solutions for the mean and variance of the process considering input dependent noise. As a result, the probability of an observation is given by:
(5)  
where and are the ARDSE kernels of the mean and the logarithm of the variance . The matrix is diagonal with , is the hyperparameter of the logvariance mean and is a positive semidefinite diagonal matrix optimized together with the other hyperparameters using conjugate gradient descent.
The similarity between the inference equations for GPs and VHGPs is remarkable. The term makes equally reference to the mean error due to the regression process and the equation for is essentially the same. The constant noise in GPs however is substituted by the inputdependent expression , the expected noise of the process at the input . In terms of computational cost, GPs and VHGPs scale alike with the amount of training data.
V The learned model
In this section we describe in more detail the process to learn the pushing model for the four objects in Figure 4 sliding on a horizontal surface of four different materials.
To compute the pushing model of a certain pair objectsurface, we train three independent VHGPs, one for each output , , and . While not optimal from a dataefficiency perspective, since we neglect the existing correlations between the outputs variables, it proves sufficient in terms of performance as long as enough training data is used. For future work it would be interesting to combine multitask GP prediction with heteroscedastic GPs.
Once each VHGP is trained, we visualize the effect of different pushes given a fixed velocity. Each push is defined by the contact point and the pushing angle . Figure 5 shows the regressed distribution for the case where the pusher’s velocity is mm/s, the object shape is a square, and the surface is plywood. In accordance with intuition, we see that the displacement in the pusher’s direction is maximum when pushes are done at the center of the edge, , and perpendicular to the edge, . The maximum change in orientation happens when pushing in between the edge center and the vertex with a pushing angle of if and if .
Analogously, Figure 6 shows the modeled and experimental standard deviation of the predicted pushes, as a function of the contact location and direction. We observe that the magnitude of the noise is in between and of the magnitude of the expected output, which is significant, and further motivates the work in this paper.
There are well defined regions that present more noise than others. To a certain degree the prediction matches well with the validation data, also shown in Figure 6. The precision of the measurement equipment (Vicon tracking system), suggests that the regressed noise comes principally from the pushing process and not from sensor noise. It is further indicative that the shape of the regressed noise remains more o less constant when considering the same object but a different surface.
A datadriven model that captures the uncertainty of interaction, beyond the deterministic predictions of standard analytical models gives us a more complete perspective of the dynamics of pushing. This information can be used to differentiate between more and less stable pushes, and improve multistep prediction, by propagating uncertainty.
Vi Evaluation of the Learned Model
We evaluate the performance of the learned model with the standard metrics normalized mean square error (NMSE) and normalized log probability density (NLPD):
NMSE  (6)  
NLPD  (7) 
where is the number of elements in the test set and the observations. We note with the predicted value from the VHGP model at and with the mean of the observations in the training set. We consider the total NMSE of the model as the sum of the individual NMSEs for each output. For the total NLPD, we consider as the product of the individual probabilities of each output given their respective VHGP.
Note that NMSE, as defined in (6), does not take into account the variability of motion and only considers its expected value . It reflects the squared distance between the model outputs and the real process normalized by the variance of the observations. The NMSE is especially useful when comparing the results with deterministic models of pushing. The NLPD metric instead, computes how likely are the observations according to a given probabilistic model. It is more appropriated for evaluating our datadriven models as they also regress the uncertainty of the motion in (7). For the NLPD, lower results imply a higher likelihood of the data. Consequently NLPD penalizes both underconfident and overconfident models, so deterministic models can not be fairly evaluated using NLPD.
Via Evaluation on Different Objects and Materials
Figure 7 and Figure 8 show the evaluation of the model when trained on four different surfaces (plywood, Delrin, polyurethane, and ABS) and for four different object shapes (square, circle and two ellipses, as shown in Figure 4). An advantage of VHGPs models is that they do not require a large amount of data. The plots show that less than 1000 samples are sufficient to saturate performance, which is equivalent to collecting about 5 minutes of pushing experiments in our setup if we use s. Therefore our VHGP models can provide good planar push models of new objects and surfaces without having to go through an extensive data collection. The fact that VHGPs performance remains more or less constant after a certain amount of data also suggests that the pushing motion is a sufficiently well defined process and simple enough to be learned in general from a reduced exploration of the environment.
ViB Comparison with Other Models
Outputs  Analytical  GP  VHGP 

NMSE  0.72  0.38  0.37 
NLPD  –  2.82  3.73 
In this section we compare the performance of VHGP against a standard GP model, and a commonly used analytical model [8]. GP and VHGP show a similar NMSE. This is expected since NMSE only evaluates the most likely outcome, and both GPs and VHGPs have a very similar formulation for the regressed mean. The difference in NLPD however, justifies the need of inputdependent noise to explain the stochastic behaviour of pushing. As all inputs in a classical GP are supposed to have the same noise, all the observations become equally weighted during training. Instead, in VHGPs those observations from lower noise regions have higher relevance in the regression process.
We also compare our datadriven model with the analytical model proposed by Lynch et al. [8], that relies on assumptions of quasistatic, uniform pressure distribution, uniform coulomb friction, and an ellipsoidal approximation to the limit surface.
The proposed datadriven model, instead of relying on a perfect knowledge of the objectsurface interaction, outputs a distribution of the motion’s outcome that encloses unexpected behaviours. Figure 9 shows that both GPs and VHGPs outperform the analytical model after 100 samples approximately.
We also observed that for high velocities the analytical model is more unreliable. This is reasonable as the analytical model assumes a quasistatic interaction and does not take into account inertia. In our model, those dynamic effects are captured by adding the pusher velocity as an input and through the uncertainty of the distribution.
Vii Validation
The training and testing set used from [2] does not contain sufficient repeated pushes to estimate ground truth variance for a given push. To validate the distributions predicted by our model, we captured a benchmark dataset that incorporates repeated trajectories so that a reliable notion of uncertainty can be extracted from them (the dataset is available in [7]). The differences between repeated pushes should mainly include the uncertainty in the objectsurface interaction, i.e., the stochastic side of the push itself. Given the samples of each repeated push, we compute the expected motion of the object and its variance from the benchmark. Section IIIB contains a more detailed description of the dataset, and the bottom rows of Figure 5 and Figure 6 illustrate the obtained means and variances.
To evaluate the results of the model learned from the training dataset in [2]
, against the benchmark dataset, we use the average KL divergence over the pushes. If we assume that the real distribution of each push comes from three independent Gaussians, we can directly use the KL divergence between two Gaussian distributions:
(8) 
where and .
The average and median KL divergences of each model show that our model not only regresses properly the expected motion, but also can predict reasonably well its distribution (Table II). This is crucial to create robust probabilistic models that can be latter incorporated into multistep planning and control.
KL divernge  GP  VHGP 

Average KL  22.63  15.32 
Median KL  5.74  5.34 
Viii Discussion
In recent work [2, 23] we provided empirical evidence that planar frictional interaction shows nontrivial statistical behavior, and suggested that a probabilistic model might yield a practical and less overconfident approach to model frictional contact. This paper is a step in that direction. We focus on the problem of planar pushing, which has proven essential for many types of interaction, both simple and complex, and for which the robotics community has developed a good and longstanding analytical understanding.
Inputdependent noise. This work starts from the empirical observation that the magnitude of the observed variability under a constant push can vary up to an order of magnitude with the pushing action, i.e., pushing location, pushing direction or pushing velocity. A model that accounts for that variability could be used, for example, to avoid actions that yield unpredictable behavior.
Datadriven modeling. Building from the recent pushing dataset by Yu et al. [2], we propose to use a datadriven modeling approach based on a Variational Heteroscedastic Gaussian process model (VHGP) to capture the mean and variance of a planar frictional push. Unlike traditional Gaussian processes, which assume a level of output noise independent of their input, heteroscedastic Gaussian processes model the dependence of both the mean and variance of the outcome of a planar push as functions of the input pushing action.
The learned models are specific to the particular object and material. Generalization over materials and shapes is an interesting question that we plan to explore in future work, which will require a model less computationally expensive to train, such as those based on sparse Gaussian processes [24] or Random Features [25].
Evaluation. We show that an order of 100 samples is sufficient to overcome the performance of a standard analytical model, and evaluate the improvement of the VHGP framework over a traditional homoscedastic GP. The accuracy of the VHGP model tails off at about training samples, data that can be captured in about 5 minutes in our training setup.
We validate the model against a new dataset collected in the same setup as [2] for the purpose of benchmarking. This new dataset is composed of 100 identical pushes for each of a set of more than 300 different pushing actions, which gives an empirical footprint of the stochasticity of planar pushing. The performance, evaluated by means of the KL divergence shows a clear improvement over a normal GP.
Gaussianity vs. multimodality. While VHGPs are the stateoftheart for data efficient regression for input dependent processes, Gaussianity and unimodality of the underlying dynamics is still a key assumption. We know that this is not always the case (Figure 10). However, a model with inputdependent noise can express multimodal behavior when integrated over time. We are interested in exploring further the idea of capturing finite multimodal behavior with unimodal Gaussian instantaneous models.
Dependence with velocity. A very common assumption in robotic manipulation is to neglect the inertial effects of interaction, i.e. the quasistatic assumption. In this paper we avoid that assumption by making the velocity of the pusher an explicit input to the model.
To evaluate the importance of considering velocity as an input, we group data from different velocities by timescaling the interaction (under the quasistatic assumption, an action executed twice as fast will have the same outcome, except in half the time) and train models without the velocity as input. If the system is indeed quasistatic, combining the data from different velocities should increase the amount of training data and yield better (or not worse) performance. Otherwise, we should see the performance worsen. Figure 11 shows the evolution of the performance of the learned model as we add data from larger velocities, all timescaled to 10mm/s. We can see that the performance peaks at around 5080mm/s and degrades after that. We conclude that the quasistatic assumption does not hold after that, and hence there is value in adding velocity as an explicit input.
References
 LazaroGredilla and Titsias [2011] M. LazaroGredilla and M. Titsias, “Variational Heteroscedastic Gaussian Process Regression,” in ICML, 2011.
 Yu et al. [2016] K.T. Yu, M. Bauza, N. Fazeli, and A. Rodriguez, “More than a Million Ways to be Pushed. A HighFidelity Experimental Data Set of Planar Pushing,” in IROS, 2016.
 Mason [1986] M. T. Mason, “Mechanics and Planning of Manipulator Pushing Operations,” IJRR, vol. 5, no. 3, 1986.
 Goyal et al. [1991] S. Goyal, A. Ruina, and J. Papadopoulos, “Planar Sliding with Dry Friction Part 1 . Limit Surface and Moment Function,” Wear, vol. 143, 1991.
 Lynch and Mason [1996] K. M. Lynch and M. T. Mason, “Stable Pushing: Mechanics, Controllability, and Planning,” IJRR, vol. 15, no. 6, 1996.
 Hogan and Rodriguez [2016] F. Hogan and A. Rodriguez, “Feedback Control of the PusherSlider System: A Story of Hybrid and Underactuated Contact Dynamics,” in WAFR, 2016.
 [7] Website for the data set. [Online]. Available: https://mcube.mit.edu/pushdataset
 Lynch et al. [1992] K. M. Lynch, H. Maekawa, and K. Tanie, “Manipulation and active sensing by pushing using tactile feedback.” in IROS, 1992.

Pinto and Gupta [2016]
L. Pinto and A. Gupta, “Supersizing selfsupervision: Learning to grasp from 50K tries and 700 robot hours,” in
ICRA, 2016.  Agrawal et al. [2016] P. Agrawal, A. Nair, P. Abbeel, J. Malik, and S. Levine, “Learning to poke by poking: Experiential learning of intuitive physics,” in NIPS, 2016.
 Huang et al. [2016] Y. Huang, M. Bianchi, M. Liarokapis, and Y. Sun, “Recent data sets on object manipulation: A survey,” in Big Data, vol. 4, no. 4, 2016.
 Salganicoff et al. [1993] M. Salganicoff, G. Metta, A. Oddera, and G. Sandini, “A visionbased learning method for pushing manipulation,” IRCS9347, U. of Pennsylvania, Department of Computer and Information Science, Tech. Rep., 1993.
 Walker and Salisbury [2008] S. Walker and J. K. Salisbury, “Pushing Using Learned Manipulation Maps,” in ICRA, 2008.
 Lau et al. [2011] M. Lau, J. Mitani, and T. Igarashi, “Automatic Learning of Pushing Strategy for Delivery of IrregularShaped Objects,” in ICRA, 2011.
 Meric et al. [2015] T. Meric, M. Veloso, and H. Akin, “Pushmanipulation of complex passive mobile objects using experimentally acquired motion models,” Autonomous Robots, vol. 38, no. 3, 2015.
 Zhou et al. [2016] J. Zhou, R. Paolini, J. A. Bagnell, and M. T. Mason, “A Convex Polynomial ForceMotion Model for Planar Sliding: Identification and Application,” in ICRA, 2016.
 Paolini et al. [2014] R. Paolini, A. Rodriguez, S. S. Srinivasa, and M. T. Mason, “A DataDriven Statistical Framework for PostGrasp Manipulation,” IJRR, vol. 33, no. 4, 2014.
 Le and Smola [2005] Q. Le and A. Smola, “Heteroscedastic gaussian process regression,” in ICML, 2005.
 Kersting et al. [2007] K. Kersting, C. Plagemann, P. Pfaff, and W. Burgard, “Most likely heteroscedastic gaussian process regression,” in ICML, 2007.
 M. Mukadam and Boots [2016] X. Y. M. Mukadam and B. Boots, “Gaussian Process Motion planning,” in ICRA, 2016.
 S. Choi and Oh [2016] K. L. S. Choi and S. Oh, “Robust learning from demonstration using leveraged Gaussian processes and sparseconstrained optimization,” in ICRA, 2016.

Rasmussen and Williams [2006]
C. Rasmussen and C. Williams,
Gaussian Processes for Machine Learning
. MIT Press, 2006.  Kolbert et al. [2016] R. Kolbert, N. Chavan Dafle, and A. Rodriguez, “Experimental Validation of Contact Dynamics for InHand Manipulation,” in ISER, 2016.
 Snelson and Ghahramani [2006] E. Snelson and Z. Ghahramani, “Sparse Gaussian Processes using Pseudoinputs,” in NIPS, 2006.
 Rahimi and Recht [2008] A. Rahimi and B. Recht, “Random Features for LargeScale Kernel Machines,” in NIPS, 2008.
Comments
There are no comments yet.