Models of physical interaction in robotics are driven by experimental laws of friction and impact. These laws, such as Coulomb friction, describe the macroscopic behavior of contact by compounding variations at the microscopic level. As a result, one expects them to be accurate at most in a statistical sense. We are interested in learning data-driven empirical models that capture more accurately statistical physical interaction, including its expected variability.
Pushing is a simple manipulation task which already shows interesting statistical behavior. It is primitive to our ability to manipulate objects large and small, and is often involved in more complex manipulation behaviors such as grasping. In previous work  we provide empirical evidence of the variability in the outcome of a planar push. Figure 1 shows that a series of pushes (center), indistinguishable to sensor and actuator resolution, yields divergent outcomes, while a different set of pushes (left) yields a more convergent set of outcomes. Some pushes are more precise than others, and some yield multi-modal behavior (right).
In this paper we learn a compact data-driven model that captures the first two moments, i.e., mean and variance, of the expected behavior of a pushed object. We expect that models like this can be the basis for more realistic simulation, can aid in the design of robust plans or control policies, and can yield more statistically sound inference. The approach, validation, and structure of this paper, are as follows:
Pushing Data: We rely on the dataset by Yu et al.  to learn and test the model. We contribute in Section III with an addendum to the dataset with repeated pushes in a grid of pushing locations and directions designed to validate the variances predicted by the model (the new dataset is available online ).
Model: The learned model is based on a family of Gaussian processes called Heteroscedastic Gaussian processes (HGPs), along with their state-of-the-art variational implementation . This model targets phenomena with input-dependent noise, i.e., when the amount of noise introduced by the system depends on the action. Section V
uses it to estimate the most likely outcome of a push and its variance.
Evaluation and comparisons: In Section VI
we evaluate the push predictions for four objects sliding on four materials. The accuracy is measured by the mean square prediction error (NMSE) and the normalized log probability density (NLPD), and compared to normal Gaussian processes and a common analytical model.
We finish in Section VIII with a discussion on contributions, limitations, and possible directions for future work.
Ii Related Work
There is significant excitement surrounding empirical data-driven techniques for robotic manipulation [9, 10]. Recently Huang et al.  surveyed efforts to create datasets of object manipulation. The high-fidelity dataset on planar pushing interaction by Yu et al.  is specially relevant to this work. It contains recordings of pushing motions and forces for different dimensions of shape, material, pushing direction, location, velocity, and acceleration. It also provides empirical evidence of the variability of the pushing process, which is the basis of the learned models in this paper.
Over the years, several works have applied data-driven techniques to the problem of planar pushing [12, 13, 14, 15, 16]. Most recently, Zhou et al.  presented a data-driven but physics-inspired model for planar friction. The algorithm approximates the limit surface representation of the relationship between frictional loads and motion twists at a planar contact, and is the state-of-the-art in data-efficient friction modeling in robotics. All these algorithms study the problem of controlling a pushed object in a data-driven fashion, but to our knowledge, no previous work has attempted to model both the expected behavior and the experimental variability.
In this paper we use a probabilistic model of the Gaussian processes family to predict the outcome of a push. Gaussian processes are used often to capture both mean and variance of a dynamic system. For example Paolini et al.  use Gaussian processes to learn both the transition dynamics and the observation model of prehensile manipulation tasks. In this paper we explicitly consider the dependence of the noise in the transition dynamics with the input action by using heteroscedastic Gaussian processes introduced by Le and Smola . Kersting et al.  proposed a simplified learning algorithm based on maximum likelihood approach, which tends to underestimate noise levels. Lazaro-Gredilla and Titsias  solve this problem by introducing a variational heteroscedastic Gaussian process algorithm which we use in this work.
Iii Pusher-Slider Data
This paper focuses on modeling the probabilistic behavior of the pusher-slider system, illustrated in Figure 2. This section describes the data we use to learn and validate the model. We are interested in learning the behavior of the slider as an input-output relationship. As illustrated in Figure 3 the representation for the input space is:
Magnitude of the velocity of the pusher.
Contact point on the perimeter of the slider.
and the representation for the output space is:
COM x displacement in the pusher ref. frame.
COM y displacement in the pusher ref. frame.
In this paper we use two sets of data: a large-scale general purpose dataset of planar pushing for learning, and a dataset with repeated pushes we collected for the purpose of validating the results. In particular we validate the model prediction of the variance for the outcome of a push.
Iii-a Learning Data
In previous work  we captured a large set of planar pushes with the robotic setup in Figure 2, composed of a high-precision industrial robot fitted with a cylindrical pusher and stainless steel objects of different shapes sliding on surfaces of different materials. The system records the trajectories of the pusher and object and the force at the interaction. We will show in later sections that 100 samples are sufficient in average to outperform analytical models, and model accuracy saturates with less than 1000 samples.
Iii-B Validation Data
Our goal is to predict reliably the object’s expected motion and its variance:
where is the input/action and and are the input-dependent expected outcome and variance. The dependence of the variance with the input is the key complexity we address in this paper, motivated by the example in Figure 1, and leads us to consider HGPs instead of standard GPs.
To validate the observation that the output noise depends on the input, we collected a new dataset in the same setup in Figure 2 containing 100 repetitions of each push considered, which gives us an approximate distribution of the object motion. In this new dataset, the pusher follows a straight trajectory of 1cm long at 20mm/s. The initial contact angles go from to radians spaced by 0.1 radians while we consider 11 different initial contact points evenly spaced on the side of the object. This produces a sufficiently dense grid of pushes allowing us to extract for each push the expected mean and variance of the object motion. This dataset is available online .
Besides validating the proposed model, the new dataset also provides empirical evidence to study the variability of the pushing process, including multi-modality effects.
Iv Heteroscedastic Gaussian processes
Gaussian processes (GPs) are a flexible and commonly used framework in robotics. We find them in many different areas in robotics such as manipulation, motion planning or learning from demonstration [17, 20, 21]. In this paper, we apply GPs to the modeling of planar pushing.
Classical (homoscedastic) GPs  assume that the noise from the observations is Gaussian and constant over the input:
where is an observation of the process at the input , is the latent function that we want to regress and represents Gaussian noise with variance .
The assumption of constant Gaussian noise together with a GP prior on the latent function makes analytical inference possible for GPs (IV). We consider that follows a GP prior where is the Automatic Relevance Determination Squared Exponential (ARD-SE) kernel . Given the training set , the probability of an observation at the input is given by:
where is a matrix that evaluates the ARD-SE kernel in the training points, ,
is a vector withand is the value of the kernel at , . Finally, represents the vector of observations from the training, and
is the set of hyperparameters includingand the kernel parameters that are optimized during the training process.
The expected variance of an observation at the input comes from the addition of two variances: and (IV). While is constant and represents the process noise, depends on and is only related to the regression error (not to the real process from which the data originates). In practice, the term depends highly on the density of training points around the input and reflects how confident is the GPs regression on the mean value .
Assuming that the process noise is constant over the input space is sometimes too restricting. Allowing some input regions to be more noisy than others is beneficial for practical applications where there are naturally stable and unstable dynamics. As we motivated in the introduction, this is the case for pushing where depending on the type of push, the resulting motion can be convergent or divergent (Figure 1).
Some extensions of GPs can incorporate input-dependent noise. These algorithms are referred as heteroscedastic Gaussian processes (HGPs) and can regress both the mean of the process and its variance over the input space. Moreover, they also improve the mean estimation by giving more relevance to those observations obtained from less noisy inputs. In HGP regression, observations are drawn from:
where now explicitly depends on .
In this paper, we use the state of art algorithm for HGPs called variational HGP (VHGP) proposed by Lazaro-Gredilla and Titsias . Using Bayesian variational theory they derive closed-form solutions for the mean and variance of the process considering input dependent noise. As a result, the probability of an observation is given by:
where and are the ARD-SE kernels of the mean and the logarithm of the variance . The matrix is diagonal with , is the hyperparameter of the log-variance mean and is a positive semidefinite diagonal matrix optimized together with the other hyperparameters using conjugate gradient descent.
The similarity between the inference equations for GPs and VHGPs is remarkable. The term makes equally reference to the mean error due to the regression process and the equation for is essentially the same. The constant noise in GPs however is substituted by the input-dependent expression , the expected noise of the process at the input . In terms of computational cost, GPs and VHGPs scale alike with the amount of training data.
V The learned model
In this section we describe in more detail the process to learn the pushing model for the four objects in Figure 4 sliding on a horizontal surface of four different materials.
To compute the pushing model of a certain pair object-surface, we train three independent VHGPs, one for each output , , and . While not optimal from a data-efficiency perspective, since we neglect the existing correlations between the outputs variables, it proves sufficient in terms of performance as long as enough training data is used. For future work it would be interesting to combine multi-task GP prediction with heteroscedastic GPs.
Once each VHGP is trained, we visualize the effect of different pushes given a fixed velocity. Each push is defined by the contact point and the pushing angle . Figure 5 shows the regressed distribution for the case where the pusher’s velocity is mm/s, the object shape is a square, and the surface is plywood. In accordance with intuition, we see that the displacement in the pusher’s direction is maximum when pushes are done at the center of the edge, , and perpendicular to the edge, . The maximum change in orientation happens when pushing in between the edge center and the vertex with a pushing angle of if and if .
Analogously, Figure 6 shows the modeled and experimental standard deviation of the predicted pushes, as a function of the contact location and direction. We observe that the magnitude of the noise is in between and of the magnitude of the expected output, which is significant, and further motivates the work in this paper.
There are well defined regions that present more noise than others. To a certain degree the prediction matches well with the validation data, also shown in Figure 6. The precision of the measurement equipment (Vicon tracking system), suggests that the regressed noise comes principally from the pushing process and not from sensor noise. It is further indicative that the shape of the regressed noise remains more o less constant when considering the same object but a different surface.
A data-driven model that captures the uncertainty of interaction, beyond the deterministic predictions of standard analytical models gives us a more complete perspective of the dynamics of pushing. This information can be used to differentiate between more and less stable pushes, and improve multi-step prediction, by propagating uncertainty.
Vi Evaluation of the Learned Model
We evaluate the performance of the learned model with the standard metrics normalized mean square error (NMSE) and normalized log probability density (NLPD):
where is the number of elements in the test set and the observations. We note with the predicted value from the VHGP model at and with the mean of the observations in the training set. We consider the total NMSE of the model as the sum of the individual NMSEs for each output. For the total NLPD, we consider as the product of the individual probabilities of each output given their respective VHGP.
Note that NMSE, as defined in (6), does not take into account the variability of motion and only considers its expected value . It reflects the squared distance between the model outputs and the real process normalized by the variance of the observations. The NMSE is especially useful when comparing the results with deterministic models of pushing. The NLPD metric instead, computes how likely are the observations according to a given probabilistic model. It is more appropriated for evaluating our data-driven models as they also regress the uncertainty of the motion in (7). For the NLPD, lower results imply a higher likelihood of the data. Consequently NLPD penalizes both underconfident and overconfident models, so deterministic models can not be fairly evaluated using NLPD.
Vi-a Evaluation on Different Objects and Materials
Figure 7 and Figure 8 show the evaluation of the model when trained on four different surfaces (plywood, Delrin, polyurethane, and ABS) and for four different object shapes (square, circle and two ellipses, as shown in Figure 4). An advantage of VHGPs models is that they do not require a large amount of data. The plots show that less than 1000 samples are sufficient to saturate performance, which is equivalent to collecting about 5 minutes of pushing experiments in our setup if we use s. Therefore our VHGP models can provide good planar push models of new objects and surfaces without having to go through an extensive data collection. The fact that VHGPs performance remains more or less constant after a certain amount of data also suggests that the pushing motion is a sufficiently well defined process and simple enough to be learned in general from a reduced exploration of the environment.
Vi-B Comparison with Other Models
In this section we compare the performance of VHGP against a standard GP model, and a commonly used analytical model . GP and VHGP show a similar NMSE. This is expected since NMSE only evaluates the most likely outcome, and both GPs and VHGPs have a very similar formulation for the regressed mean. The difference in NLPD however, justifies the need of input-dependent noise to explain the stochastic behaviour of pushing. As all inputs in a classical GP are supposed to have the same noise, all the observations become equally weighted during training. Instead, in VHGPs those observations from lower noise regions have higher relevance in the regression process.
We also compare our data-driven model with the analytical model proposed by Lynch et al. , that relies on assumptions of quasi-static, uniform pressure distribution, uniform coulomb friction, and an ellipsoidal approximation to the limit surface.
The proposed data-driven model, instead of relying on a perfect knowledge of the object-surface interaction, outputs a distribution of the motion’s outcome that encloses unexpected behaviours. Figure 9 shows that both GPs and VHGPs outperform the analytical model after 100 samples approximately.
We also observed that for high velocities the analytical model is more unreliable. This is reasonable as the analytical model assumes a quasi-static interaction and does not take into account inertia. In our model, those dynamic effects are captured by adding the pusher velocity as an input and through the uncertainty of the distribution.
The training and testing set used from  does not contain sufficient repeated pushes to estimate ground truth variance for a given push. To validate the distributions predicted by our model, we captured a benchmark dataset that incorporates repeated trajectories so that a reliable notion of uncertainty can be extracted from them (the dataset is available in ). The differences between repeated pushes should mainly include the uncertainty in the object-surface interaction, i.e., the stochastic side of the push itself. Given the samples of each repeated push, we compute the expected motion of the object and its variance from the benchmark. Section III-B contains a more detailed description of the dataset, and the bottom rows of Figure 5 and Figure 6 illustrate the obtained means and variances.
To evaluate the results of the model learned from the training dataset in 
, against the benchmark dataset, we use the average KL divergence over the pushes. If we assume that the real distribution of each push comes from three independent Gaussians, we can directly use the KL divergence between two Gaussian distributions:
where and .
The average and median KL divergences of each model show that our model not only regresses properly the expected motion, but also can predict reasonably well its distribution (Table II). This is crucial to create robust probabilistic models that can be latter incorporated into multi-step planning and control.
In recent work [2, 23] we provided empirical evidence that planar frictional interaction shows non-trivial statistical behavior, and suggested that a probabilistic model might yield a practical and less over-confident approach to model frictional contact. This paper is a step in that direction. We focus on the problem of planar pushing, which has proven essential for many types of interaction, both simple and complex, and for which the robotics community has developed a good and long-standing analytical understanding.
Input-dependent noise. This work starts from the empirical observation that the magnitude of the observed variability under a constant push can vary up to an order of magnitude with the pushing action, i.e., pushing location, pushing direction or pushing velocity. A model that accounts for that variability could be used, for example, to avoid actions that yield unpredictable behavior.
Data-driven modeling. Building from the recent pushing dataset by Yu et al. , we propose to use a data-driven modeling approach based on a Variational Heteroscedastic Gaussian process model (VHGP) to capture the mean and variance of a planar frictional push. Unlike traditional Gaussian processes, which assume a level of output noise independent of their input, heteroscedastic Gaussian processes model the dependence of both the mean and variance of the outcome of a planar push as functions of the input pushing action.
The learned models are specific to the particular object and material. Generalization over materials and shapes is an interesting question that we plan to explore in future work, which will require a model less computationally expensive to train, such as those based on sparse Gaussian processes  or Random Features .
Evaluation. We show that an order of 100 samples is sufficient to overcome the performance of a standard analytical model, and evaluate the improvement of the VHGP framework over a traditional homoscedastic GP. The accuracy of the VHGP model tails off at about training samples, data that can be captured in about 5 minutes in our training setup.
We validate the model against a new dataset collected in the same setup as  for the purpose of benchmarking. This new dataset is composed of 100 identical pushes for each of a set of more than 300 different pushing actions, which gives an empirical footprint of the stochasticity of planar pushing. The performance, evaluated by means of the KL divergence shows a clear improvement over a normal GP.
Gaussianity vs. multi-modality. While VHGPs are the state-of-the-art for data efficient regression for input dependent processes, Gaussianity and unimodality of the underlying dynamics is still a key assumption. We know that this is not always the case (Figure 10). However, a model with input-dependent noise can express multi-modal behavior when integrated over time. We are interested in exploring further the idea of capturing finite multi-modal behavior with uni-modal Gaussian instantaneous models.
Dependence with velocity. A very common assumption in robotic manipulation is to neglect the inertial effects of interaction, i.e. the quasi-static assumption. In this paper we avoid that assumption by making the velocity of the pusher an explicit input to the model.
To evaluate the importance of considering velocity as an input, we group data from different velocities by time-scaling the interaction (under the quasi-static assumption, an action executed twice as fast will have the same outcome, except in half the time) and train models without the velocity as input. If the system is indeed quasi-static, combining the data from different velocities should increase the amount of training data and yield better (or not worse) performance. Otherwise, we should see the performance worsen. Figure 11 shows the evolution of the performance of the learned model as we add data from larger velocities, all time-scaled to 10mm/s. We can see that the performance peaks at around 50-80mm/s and degrades after that. We conclude that the quasi-static assumption does not hold after that, and hence there is value in adding velocity as an explicit input.
- Lazaro-Gredilla and Titsias  M. Lazaro-Gredilla and M. Titsias, “Variational Heteroscedastic Gaussian Process Regression,” in ICML, 2011.
- Yu et al.  K.-T. Yu, M. Bauza, N. Fazeli, and A. Rodriguez, “More than a Million Ways to be Pushed. A High-Fidelity Experimental Data Set of Planar Pushing,” in IROS, 2016.
- Mason  M. T. Mason, “Mechanics and Planning of Manipulator Pushing Operations,” IJRR, vol. 5, no. 3, 1986.
- Goyal et al.  S. Goyal, A. Ruina, and J. Papadopoulos, “Planar Sliding with Dry Friction Part 1 . Limit Surface and Moment Function,” Wear, vol. 143, 1991.
- Lynch and Mason  K. M. Lynch and M. T. Mason, “Stable Pushing: Mechanics, Controllability, and Planning,” IJRR, vol. 15, no. 6, 1996.
- Hogan and Rodriguez  F. Hogan and A. Rodriguez, “Feedback Control of the Pusher-Slider System: A Story of Hybrid and Underactuated Contact Dynamics,” in WAFR, 2016.
-  Website for the data set. [Online]. Available: https://mcube.mit.edu/push-dataset
- Lynch et al.  K. M. Lynch, H. Maekawa, and K. Tanie, “Manipulation and active sensing by pushing using tactile feedback.” in IROS, 1992.
Pinto and Gupta 
L. Pinto and A. Gupta, “Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours,” inICRA, 2016.
- Agrawal et al.  P. Agrawal, A. Nair, P. Abbeel, J. Malik, and S. Levine, “Learning to poke by poking: Experiential learning of intuitive physics,” in NIPS, 2016.
- Huang et al.  Y. Huang, M. Bianchi, M. Liarokapis, and Y. Sun, “Recent data sets on object manipulation: A survey,” in Big Data, vol. 4, no. 4, 2016.
- Salganicoff et al.  M. Salganicoff, G. Metta, A. Oddera, and G. Sandini, “A vision-based learning method for pushing manipulation,” IRCS-93-47, U. of Pennsylvania, Department of Computer and Information Science, Tech. Rep., 1993.
- Walker and Salisbury  S. Walker and J. K. Salisbury, “Pushing Using Learned Manipulation Maps,” in ICRA, 2008.
- Lau et al.  M. Lau, J. Mitani, and T. Igarashi, “Automatic Learning of Pushing Strategy for Delivery of Irregular-Shaped Objects,” in ICRA, 2011.
- Meric et al.  T. Meric, M. Veloso, and H. Akin, “Push-manipulation of complex passive mobile objects using experimentally acquired motion models,” Autonomous Robots, vol. 38, no. 3, 2015.
- Zhou et al.  J. Zhou, R. Paolini, J. A. Bagnell, and M. T. Mason, “A Convex Polynomial Force-Motion Model for Planar Sliding: Identification and Application,” in ICRA, 2016.
- Paolini et al.  R. Paolini, A. Rodriguez, S. S. Srinivasa, and M. T. Mason, “A Data-Driven Statistical Framework for Post-Grasp Manipulation,” IJRR, vol. 33, no. 4, 2014.
- Le and Smola  Q. Le and A. Smola, “Heteroscedastic gaussian process regression,” in ICML, 2005.
- Kersting et al.  K. Kersting, C. Plagemann, P. Pfaff, and W. Burgard, “Most likely heteroscedastic gaussian process regression,” in ICML, 2007.
- M. Mukadam and Boots  X. Y. M. Mukadam and B. Boots, “Gaussian Process Motion planning,” in ICRA, 2016.
- S. Choi and Oh  K. L. S. Choi and S. Oh, “Robust learning from demonstration using leveraged Gaussian processes and sparse-constrained optimization,” in ICRA, 2016.
Rasmussen and Williams 
C. Rasmussen and C. Williams,
Gaussian Processes for Machine Learning. MIT Press, 2006.
- Kolbert et al.  R. Kolbert, N. Chavan Dafle, and A. Rodriguez, “Experimental Validation of Contact Dynamics for In-Hand Manipulation,” in ISER, 2016.
- Snelson and Ghahramani  E. Snelson and Z. Ghahramani, “Sparse Gaussian Processes using Pseudo-inputs,” in NIPS, 2006.
- Rahimi and Recht  A. Rahimi and B. Recht, “Random Features for Large-Scale Kernel Machines,” in NIPS, 2008.