1 Introduction
Control theory has a rich history of designing reactive, reliable, and accurate controllers by leveraging simple and approximate dynamical models of the real world. Due to the nature of feedback, which reevaluates the state of the system in realtime, approximate dynamical models that capture the essential properties of the system can be effective at applying corrective decisions. In contrast to traditional modelbased control approaches, a growing trend in the robotics community is to learn control policies directly from experience, which typically rely on a very large quantity of training data to achieve good performance. This paper explores the datacomplexity required to control manipulation tasks with a modelbased approach, where the model is learned from data. We employ this methodology to the problem of pushing an object on a planar surface, and find that we can design effective control policies with small data requirements (less than datapoints) while achieving accurate closedloop performance.
We are particularly interested in contactrich robotic tasks where the dynamics are largely dominated by frictional interactions. Such tasks remain challenging for both modelbased and learningbased control approaches. Classical physicsbased control methods struggle to control such systems due to the non amenable nature of their motion equations which include hybridness and underactuation. While modelfree approaches do not require a description of the motion model, they typically rely on large quantities of data that make their generalization to more complex tasks challenging.
In this paper, we investigate a planar manipulation problem where the goal is to control the pose of an object using a robotic pusher. Planar pushing is a minimal example of an underactuated manipulation task where the object motion is dominated by frictional interactions [1]. The accurate modeling of planar pushing from first principles has proved difficult, due to unknown coefficients of friction and indeterminacies in the pressure distribution between the object and the surface where it slides.
This paper aims to show that it is possible to accurately control a mechanical system such as planar pushing, with a very small amount of experimental data, through a flexible control architecture that combines GP regression with MPC. While each of these pieces has been separately studied in the context of planar pushing, the novelty of the paper lies in: 1) the combination of both; and 2) the insight that a very small amount of data is sufficient to control the system (less than 10 data points). The contributions of the paper are:

Modelbased control policy where the pushing model is learned directly from data. We show that a model learned with Gaussian processes can be effectively used in an MPC framework.

Performance comparison between analytical and datadriven controllers. We combine both analytical and learned model of pushing within an MPC framework.

Study of the datacomplexity requirements needed to achieve stable control.
A key result is that while around datapoints are sufficient to match the performance of the analytical controller, a much smaller number, in the order of , already produces functional stable tracking behavior. A video showing our approach and the experiments can be found at https://youtu.be/Z45O480pij0.
This paper is structured as: planar pushing modeling, controller design, experimental results, and discussion. Both theory and results are presented for two different modeling strategies: analytical and datadriven. The analytical model refers to a pushing model derived from first principles using Newtonian and frictional mechanics. This approach leads to hybrid dynamics that makes feedback control design difficult [1]. The datadriven approach is based on a smooth Gaussian process model learned from data, which proves effective for control purposes.
2 Related Work
The problem of planar pushing has a rich literature due to its theoretical and practical importance as one of the simplest nonprehensile manipulation problems. Since Mason [2] introduced the problem in 1986, there has been a wealth of research on its modeling, planning, and control.
Early work on pushing focused on first principles models of planar pushing interactions. Due to indeterminacies in the pressure distribution between an object and its support surface, Mason [2] introduced the voting theorem that can resolve the direction of rotation of an object under an external pushing action without explicit knowledge of the pressure distribution. Following Mason’s seminal work, several researchers have proposed practical models, most notably Goyal et al. [3], who introduced the concept of limit surface and Lynch et al. [4] that used it to model the dynamics of planar pushing.
In recent years, researchers have turned to datadriven techniques to improve the accuracy of pushing interactions [5, 6, 7, 8, 9]. Of particular interest is Zhou et al. [9], that presents a physicsinspired datadriven model for systems with planar contacts. The algorithm approximates the limit surface as the level set of a convex polynomial.
The limit surface has proven to be a valuable tool for simulation [4] and planning [10], but has remained challenging for controller design due to its hybrid nature, i.e. contacts can stick or slide. To address this issue, recent work has either restricted the control to predefined dynamical regimes [11]
or applied heuristic methods to plan through different contact modes in realtime. Another approach has been to deploy datadriven methods to control pushing tasks
[12, 13, 14, 15, 16, 17].3 Planar Pushing Modeling
The dynamics of planar pushing are notoriously difficult to model due to uncertainties in the system’s coefficients of friction and the indeterminacy in pressure distribution between the object and the support surface. By assuming quasistatic interactions (negligible inertial forces) and an ellipsoidal limit surface, Lynch et al. [4] derive an analytical mapping between pusher and object velocities. To circumvent the approximations made in Lynch et al. [4], Bauza and Rodriguez [18] have recently employed a datadriven approach to capture the dynamics of the system without relying on uncertain parameters such as pressure distributions or coefficients of friction. This section details both the analytical and datadriven models used in subsequent sections for controller design.
3.1 Analytical Model
Figure 1(a) illustrates the planar pushing system, where denote the geometric center of the object and its orientation in the world frame. The term relates the tangential distance between the pusher and the centerline of the object in the body frame.
When the pusher interacts with the object, it impresses a normal force , a tangential frictional force , and torque about the center of mass. Assuming quasistatic interactions, the applied force causes the object to move in the perpendicular direction to the limit surface, , as defined by Zhou et al. [9]. As a result the object twist in the body frame is given by where the applied wrench can be written as with , , and . The system’s motion equations are
(1) 
where
is the state vector and
the control input. Due to the nature of physical interactions, the applied forces , and the relative contact velocity must obey frictional contact laws. Coulomb’s frictional model is depicted in Fig. 1(b), where three different regimes between the pusher and the object are identified: sliding right, sticking, and sliding left. The Coulomb’s frictional model can be expressed using the mathematical constraints and when the pusher is sticking relative to the object, and when the pusher is sliding left relative to the object, and and when the pusher is sliding right relative to the object. As investigated in Hogan and Rodriguez [1], these discontinuous constraints in the dynamics lead to a hybrid system that makes controller design challenging.3.2 Data Driven Model
As an alternative to the analytical model, we consider the datadriven approach proposed by Bauza and Rodriguez [18] that better captures the complex frictional interactions between pusher, object, and support surface. Bauza and Rodriguez [18] showed that as few as samples are enough to train a Gaussian process (GP) to surpass the accuracy of the analytical model.
We train a GP to model for each output using a zero mean prior and the Automatic Relevance Determination (ARD) squared exponential kernel function where
is the signal variance and
is a diagonal matrix with the estimated characteristics lengths of each input dimension
[19].To collect data, the robot executes pushes with random initial contact position and direction as in Yu et al. [20]. The angle describes the orientation of the pusher relative to the object body frame where , and , denote the pusher velocity in the body frame. The learning problem is defined as:
Inputs: , as defined above.
Outputs: , where , , and represent the displacement of the object’s center and change in orientation in the body frame for the duration of the push . Figure 2 shows the model obtained for depending on the training datapoints.
By leveraging the quasistatic assumption, which neglects inertial effects, the model is learned for a predetermined velocity and scaled proportionally with the velocity of the pusher to recover the object velocity in the body frame as , where is the pusher velocity in the body frame.
We write the datadriven motion equations in a similar form to (1). The velocity of the pusher relative to the object is resolved in the body frame as
(2) 
The datadriven motion equations are
(3) 
where the control input for the datadriven model is .
4 Controller Design
This section presents the feedback policy design used for realtime control of planar pushing. We control both the analytical and the learned models with a model predictive control (MPC) framework due to its flexibility to the algebraic form of the model, and the possibility to enforce state and action constrains. The model predictive controller acts by simulating the model forward and finding an open loop sequence of control inputs that brings the system close to a desired trajectory. By resolving this optimization in realtime and applying the first action determined in the control sequence, this strategy can act as an effective closedloop stabilizing policy.
Model Predictive Control (MPC): Given the current error state and a nominal trajectory (, ), solve
(4)  
subject to  
with integration time step , and . The terms , , and denote weight matrices associated with the error state, the final error state, and the control input. The optimization is performed by linearizing the dynamics of the system about a desired nominal trajectory with and . The nominal trajectory is computed using the analytical model with sticking interactions to avoid hybridness as done in Zhou and Mason [21].
The model predictive approach offers the flexibility to test the same controller design on both the analytical and the datadriven model. The controller design formulation for both settings is described below.
4.1 Analytical Model
The analytical model uses the control input along with the motion equations (1). We include a constraint on to keep the pusher within the object’s edge.
Due to the nature of frictional contacts, as described earlier in Section 3.1, the input constraints are hybrid and not amenable to conventional MPC designed for continuous systems. In this paper, we follow the Family of Modes (FOM) heuristic, as introduced in Hogan and Rodriguez [1], to address this problem.
4.2 Data Driven Model
The datadriven model is more amenable than the analytical model for MPC as it presents continuous differential equations. As such, no particular care needs to be taken with regard to system hybridness and selecting mode sequences. The control input is given directly by the velocity of the pusher in the body frame of the object along with the motion equations (3) linearized about the nominal trajectory.
To address the stochastic nature of GPs, we make use of the certainty equivalent approximation [22], which acts by settings random values with their expected value during the optimization process. In the case of the pusherslider system modeled with GPs, this implies that the mean of the dynamics are propagated forward by setting the state noise value to . This approximation is computationally beneficial since it converts a stochastic optimization problem into a deterministic one. This approximation has been shown to produce good results for linear systems, where the certainty principle is optimal for systems with additive Gaussian noise.
5 Results
This section evaluates the performance of the analytical and datadriven controllers based on their ability to follow a given track. The purpose of the task is to accurately control the motion the object using a point robotic pusher about a desired timed trajectory.
5.1 Experimental Setup
Figure 3 depicts the robotic setup and the two target trajectories investigated during the experiments. We use an industrial robotic manipulator (ABB IRB 120) along with a Vicon camera system to track the position of the object as in [23]. The material used for the support surface is plywood with an estimated frictional coefficient of . The estimated coefficient of friction of the pusherobject interactions is and the object is a square of length mm and kg of mass.
The parameters of each controller are tuned to obtain their best performance on the 8track trajectory at mm/s. For the datadriven controller, the parameters are only optimized for the model with 5000 data points. The MPC parameters for the analytical model are , , associated with the state and associated with . For the learned model, we use , , and , associated with and , respectively. Both controllers consider time increments of s and time steps.
5.2 Model comparison
Table 1 compares the performance of the analytical and the datadriven controller designs on a series of trajectory tracking problems. The performance of each controller is measured by computing the average mean squared error over the duration of the experiment between the desired trajectory and the actual motion of the object’s geometric center (see Fig.4). In this section, we conduct four benchmark experiments for each controller: 1. low tracking velocity ( mm/s) without external perturbations, 2. high tracking velocity (mm/s) without external perturbations, 3. high tracking velocity (mm/s) with external perturbations, and 4. square tracking at mm/s. Two perturbations types are considered in order to test the robustness of the controllers: tangential and normal. Tangential perturbations are applied laterally to the motion of the object by perturbing the initial position of the contact point from its desired position. Normal disturbances are applied orthogonal to the object’s motion by detaching the object away from the robotic pusher by mm.
Trajectory  Error (Analytical)  Error (DataDriven) 

8track no perturbation, vmm/s  mm  mm 
8track no perturbation, vmm/s  mm  mm 
8track normal perturbation, vmm/s  mm  mm 
8track tangential perturbation, vm/s  mm  mm 
Square trajectory, vmm/s  mm  mm 
Table 1 summarizes the tracking performance results for the analytical and datadriven (5000 datapoints) controllers. Both controller designs successfully achieve closedloop tracking within mm accuracy when no external perturbations are applied. It is worth noting that although the datadriven model only performs marginally better than its modelbased counterpart, its controller design is much simpler to implement as it relies on a continuous dynamical model and doesn’t suffer from discontinuous dynamics.
5.3 Influence of data
To evaluate the amount of data required to perform the trajectory tracking task, we run the controller for the datadriven model with an increasing quantity of training data: . We consider the case of high velocity (mm/s) as it is the most challenging for the controller design. Figure 5 shows that around two hundred points are sufficient to get an accuracy equivalent to that of the analytical model.
When decreasing the amount of data, the control parameters are tuned for the model with data points and kept equal across models. The data for the reduced datasets takes subsets such that smaller datasets are always contained within bigger ones.
Figure 5 reports the effect of increasing the size of the data to train the GP model. Results show that the tracking performance increases as the model accuracy improves. Perhaps most surprising is that closedloop performance was possible for the datadriven model with as few as data points with an accuracy of mm. This illustrates that feedback control can work with very simple dynamical models that capture the essential behavior of the dynamics. By continuously reassessing the state of the object and recomputing the control action in realtime, the controller can act as to correct previous mistakes and track the desired trajectory.
For GP models trained on random datasets with less than points, the controller was unstable, i.e., unable to complete the entire lap while remained in contact with the pusher. The data generated for training the GP model in Fig. 5 is at random within the available dataset of experimental pushes. However, interestingly, as later discussed in Section 6, when we carefully select the data used for training the GP, the controller can work with as few as training points. This result on the lower limit for closedloop stable tracking was unexpected and shows that dataefficiency can be achieved through a modelbased control approach, where the model is directly learned from data.
6 Discussion
This paper explores a datadriven approach to control planar pushing tasks. Results show that learning the dynamics of the system from data and controlling the resulting model can be a dataefficient approach, and can achieve reactive and accurate robot/object interactions. We choose an algebraic form for the model that is continuously differentiable (GP), which is more amenable to apply tools from control theory. The hyperparametrization of the GP model as a function of the data ensures that even if the task has hybrid dynamics, the hybridness is only explicit in the data distribution but not in the algebraic form. In practice, we have found that a minimal amount of data can lead to a stable controller, implying that approximate models of contact interactions can be effective when combined with feedback. Given a set of fixed control parameters, the presented datadriven controller design was able to track both the 8track and square trajectories show in Fig. 3 at varying speeds, and for different amounts of data and perturbations for the case of the 8track. This ability to generalize to new tasks is an advantage of modelbased control formulations where the model learned can be adapted to new scenarios by leveraging physics.
In this paper, we aimed to answer questions regarding the learning and control representations that are more appropriate for reactive manipulation tasks. What is the level of complexity that should be captured by the motion model? How much data is required? Should system hybridness be explicitly included in the motion model? Our key findings are:

Learning a model for control is easier than learning a model for accurate simulation. We are able to get high performance control using a model trained on as little as handselected datapoints or datapoints selected at random. This result indicates what it is not surprise to the control community, that approximate simple models can enable powerful control mechanisms. We have shown that this simple models can be learned from data.

The combination of GP model learning and model predictive control yields a practical and dataefficient framework to learn and control motion models. By stabilizing the motion of an object about a predetermined trajectory, MPC only requires the learned model to be accurate to first order by exploiting feedback.

By formulating the model in velocity space, the hybridness inherent of frictional contacts (see Fig.1(b)) softens as the implicit dependence on reaction forces is hidden. This smoothing of the dynamics offers major controller design advantages without compromising the accuracy of the resulting controllers.
A limitation of the proposed methodology is that it relies on tracking a nominal trajectory both in state and actuation spaces. These are required when linearizing the system’s dynamics and are obtained in this paper by relying on planning with respect to the analytical model. Another limitation is that the contact state between the object and the robot is always assumed to be in a contact phase. As such, the controller cannot reason about separation and does not have the ability to switch sides to have better controllability on the object. These generalization would yield a combinatorial input space which would be difficult to handle by a regular GP.
To further improve the performance of the system, we believe that online adaptation of the nominal trajectory (feedforward control) by performing iterative learning control [24]
can achieve more aggressive and accurate pushing actions. Most importantly, we are interested in applying the learning control methodology presented in this paper to manipulation problems of higher complexity, with more contact formations and higher degrees of freedom.
References
 Hogan and Rodriguez [2016] F. Hogan and A. Rodriguez. Feedback Control of the PusherSlider System: A Story of Hybrid and Underactuated Contact Dynamics. In WAFR, 2016.
 Mason [1986] M. T. Mason. Mechanics and Planning of Manipulator Pushing Operations. IJRR, 5(3), 1986.

Goyal et al. [1991]
S. Goyal, A. Ruina, and J. Papadopoulos.
Planar Sliding with Dry Friction Part 1 . Limit Surface and Moment Function.
Wear, 143, 1991.  Lynch et al. [1992] K. M. Lynch, H. Maekawa, and K. Tanie. Manipulation and active sensing by pushing using tactile feedback. In IROS, 1992.
 Salganicoff et al. [1993] M. Salganicoff, G. Metta, A. Oddera, and G. Sandini. A visionbased learning method for pushing manipulation. Technical report, IRCS9347, U. of Pennsylvania, Department of Computer and Information Science, 1993.
 Walker and Salisbury [2008] S. Walker and J. K. Salisbury. Pushing Using Learned Manipulation Maps. In ICRA, 2008.
 Lau et al. [2011] M. Lau, J. Mitani, and T. Igarashi. Automatic Learning of Pushing Strategy for Delivery of IrregularShaped Objects. In ICRA, 2011.
 Meric et al. [2015] T. Meric, M. Veloso, and H. Akin. Pushmanipulation of complex passive mobile objects using experimentally acquired motion models. Autonomous Robots, 38(3), 2015.
 Zhou et al. [2016] J. Zhou, R. Paolini, J. A. Bagnell, and M. T. Mason. A Convex Polynomial ForceMotion Model for Planar Sliding: Identification and Application. In ICRA, 2016.
 ChavanDafle and Rodriguez [2017] N. ChavanDafle and A. Rodriguez. Samplingbased planning for inhand manipulations with external pushes. International Symposium on Robotics Research (ISRR). Under Review., 2017.
 Zhou et al. [2017] J. Zhou, R. Paolini, A. M. Johnson, J. A. Bagnell, and M. T. Mason. A probabilistic planning framework for planar grasping under uncertainty. IEEE Robotics and Automation Letters, 2(4):2111–2118, Oct 2017. doi: 10.1109/LRA.2017.2720845.
 Arruda et al. [2017] E. Arruda, M. J. Mathew, M. Kopicki, M. Mistry, M. Azad, and J. L. Wyatt. Uncertainty averse pushing with model predictive path integral control. In 2017 IEEERAS 17th International Conference on Humanoid Robotics (Humanoids), pages 497–502, Nov 2017. doi: 10.1109/HUMANOIDS.2017.8246918.
 Pinto and Gupta [2017] L. Pinto and A. Gupta. Learning to push by grasping: Using multiple tasks for effective learning. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 2161–2168, May 2017. doi: 10.1109/ICRA.2017.7989249.
 Zeng et al. [2018] A. Zeng, S. Song, S. Welker, J. Lee, A. Rodriguez, and T. A. Funkhouser. Learning synergies between pushing and grasping with selfsupervised deep reinforcement learning. CoRR, abs/1803.09956, 2018.
 Finn et al. [2016] C. Finn, I. Goodfellow, and S. Levine. Unsupervised learning for physical interaction through video prediction. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pages 64–72, USA, 2016. Curran Associates Inc. ISBN 9781510838819. URL http://dl.acm.org/citation.cfm?id=3157096.3157104.
 Ebert et al. [2017] F. Ebert, C. Finn, A. X. Lee, and S. Levine. Selfsupervised visual planning with temporal skip connections. arXiv preprint arXiv:1710.05268, 2017.
 Agrawal et al. [2016] P. Agrawal, A. V. Nair, P. Abbeel, J. Malik, and S. Levine. Learning to poke by poking: Experiential learning of intuitive physics. In Advances in Neural Information Processing Systems, pages 5074–5082, 2016.
 Bauza and Rodriguez [2017] M. Bauza and A. Rodriguez. A probabilistic datadriven model for planar pushing. In Robotics and Automation (ICRA), 2017 IEEE International Conference on, pages 3008–3015. IEEE, 2017.

Rasmussen and Williams [2006]
C. Rasmussen and C. Williams.
Gaussian Processes for Machine Learning
. MIT Press, 2006.  Yu et al. [2016] K.T. Yu, M. Bauza, N. Fazeli, and A. Rodriguez. More than a Million Ways to be Pushed. A HighFidelity Experimental Data Set of Planar Pushing. In IROS, 2016.
 Zhou and Mason [2017] J. Zhou and M. T. Mason. Pushing revisited: Differential flatness, trajectory planning and stabilization. In Proceedings of the International Symposium on Robotics Research (ISRR), 2017.
 Bertsekas [1995] e. a. Bertsekas, Dimitri P. Dynamic Programming and Optimal Control. Athena scientific, 1995.
 Hogan et al. [2018] F. R. Hogan, E. R. Grau, and A. Rodriguez. Reactive Planar Manipulation with Convex Hybrid MPC. In ICRA, 2018.
 Bristow et al. [2006] D. A. Bristow, M. Tharayil, and A. G. Alleyne. A survey of iterative learning control. IEEE Control Systems, 26(3):96–114, 2006.
Comments
There are no comments yet.