Let's Push Things Forward: A Survey on Robot Pushing

by   Jochen Stüber, et al.

As robot make their way out of factories into human environments, outer space, and beyond, they require the skill to manipulate their environment in multifarious, unforeseeable circumstances. With this regard, pushing is an essential motion primitive that dramatically extends a robot's manipulation repertoire. In this work, we review the robotic pushing literature. While focusing on work concerned with predicting the motion of pushed objects, we also cover relevant applications of pushing for planning and control. Beginning with analytical approaches, under which we also subsume physics engines, we then proceed to discuss work on learning models from data. In doing so, we dedicate a separate section to deep learning approaches which have seen a recent upsurge in the literature. Concluding remarks and further research perspectives are given at the end of the paper.


page 1

page 11

page 13

page 15


Learning to Scaffold the Development of Robotic Manipulation Skills

Learning contact-rich, robotic manipulation skills is a challenging prob...

Feature-Based Transfer Learning for Robotic Push Manipulation

This paper presents a data-efficient approach to learning transferable f...

Estimation and Exploitation of Objects' Inertial Parameters in Robotic Grasping and Manipulation: A Survey

Inertial parameters characterise an object's motion under applied forces...

Contact Mode Guided Motion Planning for Dexterous Manipulation

Within the field of robotic manipulation, a central goal is to replicate...

Linear Time-Varying MPC for Nonprehensile Object Manipulation with a Nonholonomic Mobile Robot

This paper proposes a technique to manipulate an object with a nonholono...

A Critical Review of Communications in Multi-Robot Systems

Purpose of Review. This review summarizes the broad roles that communica...

1 As robot make their way out of factories into human environments, outer space, and beyond, they require the skill to manipulate their environment in multifarious, unforeseeable circumstances. With this regard, pushing is an essential motion primitive that dramatically extends a robot’s manipulation repertoire. In this work, we review the robotic pushing literature. While focusing on work concerned with predicting the motion of pushed objects, we also cover relevant applications of pushing for planning and control. Beginning with analytical approaches, under which we also subsume physics engines, we then proceed to discuss work on learning models from data. In doing so, we dedicate a separate section to deep learning approaches which have seen a recent upsurge in the literature. Concluding remarks and further research perspectives are given at the end of the paper.

2 Keywords:

robotics, pushing, manipulation, forward models, motion prediction

2 Keywords:

robotics, pushing, manipulation, forward models, motion prediction

3 Introduction

We argue that pushing is an essential motion primitive in a robot’s manipulative repertoire. Consider, for instance, a household robot reaching for a bottle of milk located in the back of the fridge. Instead of picking up every yoghurt, egg carton, or jam jar obstructing the path to create space, the robot can use gentle pushes to create a corridor to its lactic target. Moving larger obstacles out of the way is even more important to mobile robots in environments as extreme as abandoned mines (Ferguson et al., 2004), the moon (King, 2016), or for rescue missions as for the Fukushima Daiichi Nuclear Power Plant. In order to save cost, space, or reduce payload, such robots are often not equipped with grippers, meaning that prehensile manipulation is not an option. Even in the presence of grippers, objects may be too large or too heavy to grasp.

In addition to the considered scenarios, pushing has numerous beneficial applications that come to mind less easily. For instance, pushing is effective at manipulating objects under uncertainty (Brost, 1988; Dogar and Srinivasa, 2010), and for pre-grasp manipulation, allowing robots to bring objects into configurations where they can be easily grasped (King et al., 2013). Less existential, yet highly interesting and entertaining, dexterous pushing skills are also widely applied and applauded in robot soccer (Emery and Balch, 2001).

Humans perform skilful manipulation tasks from an early age on, and are able to transfer behaviours learned on one object to objects of novel sizes, shapes, and physical properties. For robots, achieving those goals is challenging. For one thing, this complexity arises from the fact that frictional forces are usually unknown but play a significant role for pushing (Zhou et al., 2016). Furthermore, the dynamics of pushing are highly non-linear, with literal tipping points, and sensitive to initial conditions (Yu et al., 2016). The large body of work on robotic pushing has nevertheless produced many accurate models for predicting the outcome of a push, some analytical, some data-driven. However, models that generalise to novel objects are scarce (Kopicki et al., 2017; Stüber et al., 2018), highlighting the demanding nature of the problem.

In this paper, we review the robotic pushing literature. We focus on work concerned with making predictions of the motion of pushed objects, but we also cover relevant applications of pushing for planning and control. We begin with analytical approaches, under which we also subsume physics engines, we then proceed to discuss data-driven approaches as well as deep learning approaches which have recently become very popular in the literature.

4 Problem Statement

Figure 1: An inverse model computes an action which will affect the environment such that the next desired state (or configuration) is achieved from the current state.
Figure 2: A forward model makes a prediction on how an action will affect the current state of the environment by returning the configuration after the action is taken.

Even in ideal conditions, such as structured environments where an agent has a complete model of the environment and perfect sensing abilities, the problems of robotic grasping and manipulation are not trivial. By complete model of the environment we mean that physical and geometric properties of the world, such as pose, shape, friction parameters and the mass of the object we wish to manipulate, are exactly known. In fact, the object to be manipulated is indirectly controlled by contacts with a robot manipulator (e.g. pushing by a contacting finger part), and an inverse model (IM), which computes an action to produce the desired motion or set of forces on the object, may not be known. Sometimes forward models

(FM) may be fully or partially known, even where IMs are not available. In such cases, an FM can be used to estimate the next state of a system, given the current state and a set of executable actions. This enables planning to be achieved by imagining the likely outcomes from all possible manipulative actions, and then choosing the action which achieves the most desirable end state. Figures 

1 and  2 show a graphical representation of these two models. However the manipulation and grasping problem is typically defined in continuous state and action spaces, hence it is computationally intractable to build an optimal sequence of actions, or plan, by exploring all possible action-state combinations.

Even more challenging is the problem of grasping and manipulation in unstructured environments, where these ideal conditions do not exist. There are several reasons why an agent may fail to build a complete description of the state of the environment: sensors are noisy, robots are difficult to calibrate, actions’ outcomes are unreliable due to unmodelled variables (e.g. friction, mass distribution). Uncertainty can be modelled in several ways, but in the case of manipulation there are typically two types of uncertainty:

  • Uncertainty in physical effects: occurs when the robot acts on external bodies via physical actions (e.g., contact operations). This interaction transforms the current state of the world according to physical laws which are not fully predictable. For example, a pushed object may slide, rotate or topple with complex motions which are extremely difficult to predict, and involve physical parameters which may not be known. We can think of this as uncertainty on future states.

  • Uncertainty in sensory information: occurs when some of the quantities that define the current state of the world are not directly accessible to the robot. Thus the necessity to develop strategies to allow the robot to complete tasks in partial ignorance by recovering knowledge of its environment. In such cases, there is uncertainty about how much new information will be yielded during the execution of a new robotic action.

This paper is concerned with the evolution of FMs and their application in robotics. Table 1

summarises the literature at glance. The papers are classified according to the type of approach implemented. We identify the following six classes.

  1. Purely analytical. It is mostly seminal work drawn from classical mechanics.

  2. Hybrid. It extends analytical approaches with data-driven methods. Whilst the interactions between objects are still represented analytically, some quantities of interest are estimated based on observations, e.g. the coefficients of friction.

  3. Dynamic analysis. It integrates dynamics in the model.

  4. Physics engines. It employs a physics engine as a “black box” to make predictions about the interactions.

  5. Data-driven. It learns how to predict physical interaction from examples.

  6. Deep learning

    . As the data-driven approaches, it learns how to construct an FM from examples. The key insight is that the deep learning approaches are based on feature extraction.

The features highlighted for each approach are as follows.

  • The assumptions made by the authors on their approach. We highlight i) the quasi-static assumption in the model, ii) if it is a seminal work on 2D shapes, and iii) if the method required a known model of the object to be manipulated.

  • The type of motion analysed in the paper, such as 1D, planar (2D translation and 1D rotation around the axis), or full 3D (3D translation and 3D rotation).

  • The aim of the paper. We distinguish between predicting the motion of the object, estimating physical parameters, planning pushes, and analysing a push to reach a stable grasp.

  • The model. We distinguish between analytical, constructed from data, and by using a physics simulator.

5 Analytical Approaches

5.1 Quasi-Static Planar Pushing

Early work on robotic pushing focused on the problem of quasi-static planar pushing of sliding objects. In a first phase, several researchers, following pioneering work by Matthew T. Mason, approached the problem analytically, explicitly modelling the objects involved and their physical interactions whilst drawing on theories from classical mechanics. More recently, this tradition has moved to extend analytical models with more data-driven methods.

Assumptions Motion Aim Model
angle=90,lap=0pt-(1em)2D Object angle=90,lap=0pt-(1em)
angle=90,lap=0pt-(1em)1D angle=90,lap=0pt-(1em)Planar angle=90,lap=0pt-(1em)3D angle=90,lap=0pt-(1em)
angle=90,lap=0pt-(1em)Grasping angle=90,lap=0pt-(1em)Analytical angle=90,lap=0pt-(1em)Data-driven angle=90,lap=0pt-(1em)
PA Mason (1982)
Mason (1986b)
Peshkin and Sanderson (1988a, b)
Goyal et al. (1991)
Alexander and Maddocks (1993)
Lee and Cutkosky (1991)
Lynch et al. (1992)
Howe and Cutkosky (1996)
Mason (1990)
Mayeda and Wakatsuki (1991)
Akella and Mason (1992, 1998)
Narasimhan (1994)
Lynch and Mason (1996)
Agarwal et al. (1997)
Nieuwenhuisen et al. (2005)
de Berg and Gerrits (2010)
Miyazawa et al. (2005)
Cappelleri et al. (2006)
Dogar and Srinivasa (2011)
Cosgun et al. (2011)
Lee et al. (2015)
King (2016)
HD Lynch (1993)
Yoshikawa and Kurisu (1991)
Ruiz-Ugalde et al. (2010, 2011)
Zhu et al. (2017)
Bauzá and Rodriguez (2017)
DA Brost (1992)
Jia and Erdmann (1999)
Behrens (2013)
Chavan-Dafle and Rodriguez (2015)
PE Zito et al. (2012)
Scholz et al. (2014)
Zhu et al. (2017)
DD Moldovan et al. (2012)
Ridge et al. (2015)
Zrimec and Mowforth (1991)
Salganicoff et al. (1993)
Walker and Salisbury (2008)
Lau et al. (2011)
Kopicki et al. (2011, 2017)
Stüber et al. (2018)
Meriçli et al. (2015)
DL Denil et al. (2016)
Chang et al. (2016)
Watters et al. (2017)
Fragkiadaki et al. (2015)
Ehrhardt et al. (2017)
Byravan and Fox (2016)
Finn et al. (2016)
Table 1: Summary of the literature at glance. PA: Purely Analytical; HD: Hybrid; DA: Dynamic Analysis; PE: Physics Engines; DD: Data Driven; DL: Deep Learning.

5.1.1 Purely Analytical Approaches

To briefly introduce the problem, planar pushing (Mason, 1982), refers to an agent pushing an object such that pushing forces lie in the horizontal support plane while gravity acts along the vertical. Both pusher and pushed object move only in the horizontal plane, effectively reducing the world to 2D. Meanwhile, the quasi-static assumption (Mason, 1986b) in this context means that the involved objects’ velocities are small enough that inertial forces are negligible. In other words, objects only move when pushed by the robot. Instantaneous motion is then the consequence of the balance between contact forces, frictional forces, and gravity. The quasi-static assumption makes the problem more tractable, yielding simpler models. A key challenge in predicting the motion of a pushed object under manipulation is that the distribution of pressure at the contact between object and supporting surface is generally unknown. Hence, the system of frictional forces that arise at that contact is also indeterminate (Mason, 1982).

Figure 3: The slider (blue) is a rigid object in the plane , and its configuration space is , i.e. 2D translation and one rotation over the axis. The slider is pushed by a rigid pusher (red) at a point or set of points of contact. A world frame with origin is fixed in the plane, and a slider frame with origin is attached to the centre of friction of the slider . The configuration describes the position and orientation of the slider frame relative to the world frame . Similarly, a pusher frame with origin

and its configuration is computed. On the right side of the figure, the relation between the unit motion vector

and the centre of rotation of frame is described by the projection shown from the unit motion sphere to the tangent planes (one for each rotation sense). The line at the equator of the sphere represents translations. Reproduced from Lynch and Mason (1996).

Mason (1982, 1986a) started the line of work on pushing, proposing the voting theorem as a fundamental result. It allows to find the sense of rotation of a pushed object given the pushing direction and the centre of friction without requiring knowledge of the pressure distribution. Drawing on this seminal work, Peshkin and Sanderson (1988a, b) found bounds on the rotation rate of the pushed object given a single-point push. Following that, Goyal et al. (1991) introduced the limit surface which describes the relationship between the motion of a sliding object and the associated support friction given that the support distribution is completely specified. Under the quasi-static assumption, the limit surface allows to convert the generalised force applied by a pusher at a contact to the instantaneous generalised velocity of the pushed object. Alexander and Maddocks (1993) considered the case when only the geometric extent of the support area is known, and described techniques to bound the possible motions of the pushed object. While the limit surface provides a powerful tool for determining the motion of a pushed object, there exists no convenient explicit form to construct it. In response to this challenge, Lee and Cutkosky (1991) proposed to approximate the limit surface as an ellipsoid to improve computational time. However, their approximation requires knowledge of the pressure distribution. Marking a milestone of planar pushing research, Lynch et al. (1992) applied the ellipsoidal approximation to derive a closed-form analytical solution for the kinematics of quasi-static single-point pushing, including both sticking and sliding behaviours. Subsequently, Howe and Cutkosky (1996) explored further methods for approximating limit surfaces, including guidance for selecting the appropriate approach based on the pressure distribution, computational cost, and accuracy.

Results on the mechanics of planar pushing have been used for planning and control of manipulator pushing operations. To begin with, Mason (1990) showed how to synthesize robot pushing motions to slide a block along a wall, a problem later also studied by Mayeda and Wakatsuki (1991). Akella and Mason (1992, 1998) analysed the series of pushes needed to bring a convex polygon to a desired configuration. Narasimhan (1994) and Kurisu and Yoshikawa (1995) studied the problem of moving an object among obstacles by pushing with point contact. Lynch and Mason (1996) comprehensively studied stable pushing of a planar object with a fence-shaped finger, considering mechanics, control, and planning. First, they derived conditions for stable edge pushing, considering the case where the object will remain attached to the pusher without slipping or breaking contact. Based on this result, they then used best-first search to find a path to a specified goal location. Agarwal et al. (1997) proposed an algorithm for computing a contact-preserving push plan for a point-sized pusher and a disk-shaped object. Nieuwenhuisen et al. (2005) utilised compliance of manipulated disk-shaped objects against walls to guide their motion. They presented an exact planning algorithm for 2D environments consisting of non-intersecting line segments. Subsequently, de Berg and Gerrits (2010) improved this approach from a computational perspective and presented push planning methods both for the contact-preserving case and less restrictive scenarios. Miyazawa et al. (2005) used a rapidly-exploring tree (RRT) (LaValle, 1998)

for planning non-prehensile manipulation, including pushing, of a polyhedron with three degrees of freedom (DOF) by a robot with spherical fingers. They do not allow for sliding and rolling of robot fingers on the object surface.

Cappelleri et al. (2006) have solved a millimetre scale 2D version of the peg in the hole problem, using Mason’s models for quasi-static manipulation and an RRT-based approach for planning a sequence of pushes. Similar to potential-field-based motion planners developed by Khatib (1986), Igarashi et al. (2010) proposed a method that computes dipole-like vector fields around circular objects that guide the motion of a robot with a circular manipulator.

Figure 4: Left. Planar pushing system with world frame (with origin ) and a slider (blue) with frame as described in Fig. 5. The pusher (red) is interacting with the slider on one point of contact. It impresses a normal force , a tangential friction force , and a torque about the centre of mass. The normal force is in the direction of the normal vector of the contact point between pusher and slider, and is the angle of the friction cone assuming as the coefficient of friction. The terms and describe respectively the normal and the tangential distance between the pusher and the centre of friction of the slider . Right. Coulomb’s frictional law for the planar pushing system on the left-hand figure. Coulomb’s law states that the normal and tangential forces are related by . Three contact modes are defined. 1. Sliding right in which friction acts as a force constraint; 2. Sticking in which friction acts as a kinematic constraint; and 3. Sliding left in which friction acts as a force constraint. Reproduced from Hogan et al. (2018).

More recently, Dogar and Srinivasa (2011) employed the ellipsoidal approximation of the limit surface to plan robust push-grasp actions for dexterous hands and used them for rearrangement tasks in clutter. To use results for planar pushing, they assumed that objects do not topple easily. Furthermore, they assumed that the robot has access to 3D models of the objects involved. Cosgun et al. (2011) presented an algorithm for placing objects on cluttered table surfaces, thereby constructing a sequence of manipulation actions to create space for the object. However, focusing on planning, in their 2D manipulation they simply push objects at their centre of mass in the desired direction. Lee et al. (2015) presented a hierarchical approach to planning sequences of non-prehensile and prehensile actions, proceeding in three stages. First, they find a sequence of qualitative contact states of the moving object with other objects, then a feasible sequence of poses for the object, and lastly a sequence of contact points for the manipulators on the object. King (2016) developed a series of push planners for open-loop non-prehensile rearrangement tasks in cluttered environments. Before considering more complex scenarios, they used a simple analytical approach for forward-simulation of randomly sampled time-discrete controls within an RRT-based planner. They tested their planners on two real robotic platforms, the home care robot HERB with a seven DOF arm, and the NASA rover K-Rex.

5.1.2 Complementing Analytical Approaches with Data-Driven Methods

Transitioning to the second phase of planar pushing research, multiple factors have contributed a shift toward more data-driven approaches. For one thing, much of the previous work makes minimal assumptions regarding the pressure distribution. While convenient, those methods lead to conservative strategies for planning and control, providing only worst case guarantees. Furthermore, while assumptions regarding the pressure distribution in previous work were often minimal, other strong assumptions were frequently made to derive results analytically. Hence, more recent work has set out to validate common assumptions such as the ubiquitous quasi-static assumption. Additionally, purely analytical models do not take into account the stochastic nature of pushing in the sense that pushes indistinguishable to sensor and actuator resolution have empirically been found to produce variable results (Yu et al., 2016). Instead of making minimal or strong assumptions about parameters, they can instead be estimated based on observations. Several researchers have explored this approach.

Figure 5: A classical workflow for estimating relevant physical parameters of a pushed object. A robotic pusher performs a set of push operation on an object which is typically tracked using vision. Simpler approach employs markers on the object for more accurate estimations. An analytical model of the motion for the target object is also employed. Sensory data and physical principles are the inputs of the estimator. As output, the estimator provides with an estimate of the desired parameters, e.g. friction distribution or centre of mass. In Lynch (1993) the estimated parameters are also used for recognising objects based on their (estimated) physical properties.

Lynch (1993) presented methods both for estimating the relevant friction parameters by performing experimental pushes, and for recognising objects based on their friction parameters. Similarly, Yoshikawa and Kurisu (1991)

described how a mobile robot with a visual sensor can estimate the friction distribution of an object and the position of the centre of friction by pushing and observing the result. Yet, both of these approaches discretise the contact patch into grids so that they are either imprecise if the approximation is too coarse or suffer from the curse of dimensionality when using a fine-grained approximation.

Ruiz-Ugalde et al. (2010, 2011) formulated a compact mathematical model of planar pushing. Assuming that the object’s base shape is given, their robot explored object-table and finger-object friction coefficient parameters. Zhou et al. (2016) developed a method for modelling planar friction, proposing a framework for representing planar sliding force-motion models using convex polynomials. Notably, they also showed that the ellipsoid approximation is a less accurate special case of this representation. Zhou et al. (2017) extended the convex polynomial model to associate a commanded position-controlled end effector motion to the instantaneous resultant object motion. They modelled the probabilistic nature of object-to-surface friction by sampling parameters from a set of distributions. They presented the motion equations for both single and multiple frictional contacts and validated their results with robotic pushing and grasping experiments on the dataset published by Yu et al. (2016). That dataset comprises planar pushing interactions with more than a million samples of positions of pusher and slider, as well as interaction forces. Push interaction is varied along six dimensions, namely surface material, shape of the pushed object, contact position, pushing direction, pushing speed, and pushing acceleration. Using their dataset, they characterised the variability of friction, and evaluated the most common assumptions and simplifications made by previous models of frictional pushing. They provide an insightful table that lists the assumptions and approximations made in much of the work cited in this section. Finally, Bauzá and Rodriguez (2017) used a data-driven approach to model planar pushing interaction to predict both the most likely outcome of a push and, as a novelty, its expected variability. The learned models (also trained on the dataset by Yu et al. (2016)

) rely on a variation of Gaussian processes whilst avoiding and evaluating the quasi-static assumption. However, the learned models are specific to the particular object and material. Transfer learning is left for future work.

5.2 Physics Engines and Dynamic Analysis

While the quasi-static assumption may be reasonable in a variety of situations, other problems call for dynamic models of pushing. One popular approach to achieving this is using a physics engine. Before covering this field, we first consider work concerned with dynamic pushing that does not recur to physics engines.

5.2.1 Dynamic Analysis

Using dynamic analysis, Brost (1992) investigated the problem of catching an object by pushing it, i.e. determining the pushing motions that lead to a pusher-object equilibrium. This work was motivated by dealing with uncertainty in positioning, generating plans that work also in the worst case. Jia and Erdmann (1999) investigated dynamic pushing assuming frictionless interaction between pusher and object. Behrens (2013) instead studied dynamic pushing but assumed infinite friction between pusher and object. Chavan-Dafle and Rodriguez (2015) considered planning non-prehensile in-hand manipulation with patch contacts. They described the quasi-dynamic motion of an object held by a set of frictional contacts when subject to forces exerted by the environment. Given a grasp configuration, gripping forces, and the location and motion of a pusher, they estimate both the instantaneous motion of the object and the minimum force required to push the object into the grasp. To this end, complex contact geometries are broken up into rigid networks of point contacts.

5.2.2 Physics Engines

A large body of work related to pushing makes use of physics engines. Commonly used examples of such engines include Bullet Physics, the Dynamic Animation and Robotics Toolkit (DART), MuJoCo, the Open Dynamics Engine (ODE), NVIDIA PhysX, and Havok (Erez et al., 2015). Those engines allow for 3D simulation but 2D physics engines exist, as well, e.g. Box2D. While some physics engines have been designed for graphics and animation, others have been developed specifically for robotics. In the first category, visually-plausible simulations are key while physically-accurate simulations are essential for many robotics applications. Most physics engines today use impulse-based velocity-stepping methods to simulate contact dynamics. As this requires solving NP-hard problems at each simulation step, more tractable convex approximations have been developed, highlighting the trade-off between computational complexity and accuracy present in those engines (Erez et al., 2015). 3D physics engines use a Cartesian representation where each body has six DOF and joints are modelled as equality constraints in the joint configuration space of the bodies. In robotics, where joint constraints are ubiquitous, using generalised coordinates is computationally less expensive and prevents joint constraints from being violated.

For a comparison of physics engines, we refer the reader to two recent studies Erez et al. (2015); Chung and Pollard (2016). Erez et al. (2015) compared ODE, Bullet, PhysX, Havok, and MuJoCo. It should be noted that the study was written by the developers of MuJoCo. They introduced quantitative measures of simulation performance and focused their evaluation on challenges common in robotics. They concluded that each engine performs best on the type of system it was tailored to. For robotics, this is MuJoCo while gaming engines shine in gaming-related trials, whereby no engine emerges as a clear winner. Chung and Pollard (2016) compared Bullet, DART, MuJoCo, and ODE with regard to contact simulations whilst focusing on the predictability of behaviour. Their main result is that the surveyed engines are sensitive to small changes in initial conditions, emphasising that parameter tuning is important. Another evaluation of MuJoCo was carried out by Kolbert et al. (2017) who evaluated the contact model of MuJoCo with regard to predicting the motions and forces involved in three in-hand robotic manipulation primitives, among them pushing. In the course, they also evaluated the contact model proposed by Chavan-Dafle and Rodriguez (2015). They found that both models make useful yet not highly accurate predictions. Concerning MuJoCo, they state that its soft constraints increase efficiency but limit accuracy, especially in the cases of rigid contacts and transitions in sticking and slipping at contacts.

Researchers have applied physics engines in multifarious ways to study robotic pushing. To begin with, physics engines have been used in RRT-based planners to forward-simulate pushes. Zito et al. (2012) presented a two-level planner that combines a global RRT planner operating in the configuration space of the object, and a local planner that generates sequences of actions in the robot’s joint space that will move the object between a pair of nodes in the RRT. In this work, the experimental set-up consists of a simulated model of a tabletop robot manipulator with a single rigid spherical fingertip which it uses to push a polyflap Sloman (2006) to a goal state. To achieve this, the randomized local planner utilizes a physics engine (PhysX) to predict the object’s pose after a pushing action. Similarly, King (2016) incorporated a dynamic physics engine (Box2D) into an RRT-based planner to model dynamic motions such as a ball rolling. To reduce planning complexity, they considered only dynamic actions that lead to statically stable states, i.e. all considered objects need to come to rest before the next action. Another application of physics engines in robotic pushing was proposed by Scholz et al. (2014)

. In what they refer to as Physics-Based Reinforcement Learning, an agent uses a physics engine as a model representation. Hence, a physics engine can be seen as a hypothesis space for rigid-body dynamics. They introduced uncertainty using distributions over the engine’s physical parameters and obtained transitions by taking the expectation of the simulator’s output over those random variables. Finally,

Zhu et al. (2017) utilised a physics engine for motion prediction, learning the physical parameters through black-box Bayesian optimization. First, a robot performs random pushing actions on an object in a tabletop set-up. Based on those observations, the Bayesian learning algorithm tries to identify the model parameters that maximise the similarity between the simulated and observed outcomes. To support working with different objects, a pre-trained object detector is used that maps observed objects to a library of 3D meshes and estimates the objects’ poses on that basis. Once the physical parameters have been identified, they are used to simulate the results of new actions.

Figure 6: Simulation of a Katana robot arm equipped with a spherical finger that plans a sequence of pushes to move an L-shaped object, called polyflap Sloman (2006) to a goal state. The plan is created by using a physics engine (PhysX) to predict the outcome of a push operation. Image 01 shows the initial pose. The wire-framed L-shaped polyflap is a ‘phantom’ to indicate the desired goal state. The goal pose is translated from the initial pose by 28 cm and rotated by 90 degrees. Image 02 shows the collision-free trajectory to bring the end effector to the start pose of the first push. Images 01-04 show the first push which makes the polyflap tip over. Images 05-09 show a series of pushes which culminate in the polyflap resting in an unstable equilibrium pose along its folded edge. Images 12-13 show a sideways push. Images 14-15 show the final frontal push which aligns the polyflap with the target configuration. Courtesy of Zito et al. (2012).

While physics engines offer great value for robotic applications, e.g. by taking into consideration dynamic interaction and 3D objects, they nevertheless require explicit object modelling and extensive parameter tuning. Another approach, which we consider next, is to learn a forward model from data.

6 Learning to Predict from Examples

This part of the literature is based on learning forward models for robotic pushing from data. We first review work on qualitative models and then consider models that make metrically precise predictions. In both of those sections, we do not include work that uses deep learning techniques. We dedicate a separate section to such approaches, given the current research interest in that area and the large number of papers being published.

6.1 Qualitative Models

Much work on qualitative models revolves around the concept of affordances. The term affordance was invented by Gibson (1979) and generally refers to an action possibility that an object or environment provides to an organism. Although it has originated from psychology, the concept has been influential in various domains, among them robotics. Şahin et al. (2007) discussed affordances from a theoretical perspective while laying emphasis on their use in autonomous robotics. Min et al. (2016) provided a recent survey of affordance research in developmental robotics. Ugur et al. (2011) considered an anthropomorphic robot that learns object affordances from observations with a range camera and uses them for planning. First, the robot discovers effect categories from its actions. Then, it learns a mapping between object features and affordances which it then employs for planning. Pushing is one type of actions that they consider. While much previous work has focused on affordance models for individual objects, Moldovan et al. (2012) learned affordance models for configurations of multiple interacting objects. Their model is capable of generalising over objects and dealing with uncertainty. Ridge et al. (2015) developed a self-supervised online learning framework based on vector quantization for acquiring models of effect classes and their associations with object features. Specifically, they considered robots pushing household objects and observing them with a camera. Limitations of such approaches are that they do not tend to generalise well to novel objects and actions.

Considering other qualitative approaches than those related to affordances, Zrimec and Mowforth (1991)

developed an algorithm for knowledge extraction and representation to predict the effects of pushing. In their experiment, a robot performs random pushes and uses unsupervised learning on those observations. Their method involves partitioning, constructive induction and determination of dependencies.

Hermans et al. (2013) developed a method for predicting contact locations for pushing based on the global and local object shape. In exploratory trials, a robot pushes different objects, recording the objects’ local and global shape features at the pushing contacts. For each observed trajectory, the robot computes a push-stability or rotate-push score and maps shape features to those scores by means of regression. Based on that mapping, the robot can search objects of novel shape for features associated with effective pushes. Experimental results are reported for a mobile manipulator robot pushing household objects in a tabletop set-up.

Figure 7: The sequence of operations adopted by Zrimec and Mowforth (1991) to construct their casuality learning model. The robot learns by interacting with the environment in an unsupervised fashion. The system can autonomously discover knowledge, as e.g. whether an action generates a push on an object. The “motivation” module guarantees that the system is driven towards acquiring more knowledge about the robot/environment interaction. Reproduced from Zrimec and Mowforth (1991).

While learned affordances, and other qualitative models, can be useful in various scenarios, other applications require the ability to predict the effects of pushing more precisely, e.g. by explicitly predicting six DOF rigid body motions. We consider efforts made to achieve precise predictions in the next section.

6.2 Metrically Precise Models

Early seminal work by Salganicoff et al. (1993) presented a vision-based unsupervised learning method for robot manipulation. A robot pushes an object at a rotational point contact and learns a forward model of the action effects in image space. Subsequently, they used the forward model for stochastic action selection in manipulation planning and control. The scenarios considered in this work are relatively simple in that the pusher remains within the friction cone of the object and the contact only has one rotational DOF. Yet, this work takes an approach that is markedly different from analytical models discussed before. Instead of estimating parameters such as frictional coefficients explicitly, the authors encode that information implicitly in the mapping between actions and their effects in image space. Similarly, Walker and Salisbury (2008) learned a mapping between pushes and object motion as an alternative to explicitly modelling support friction. Set in a 2D tabletop environment, a robot with a single finger pushes objects and uses an online, memory-based local regression model to learn manipulation maps. To achieve this, they explicitly detect the object’s shape using a proximity sensor and fit a shape to the thus obtained point cloud. A method for handling objects with more complex shapes was proposed by Lau et al. (2011). In their work, a robot, while being of simple circular shape itself, aims to deliver irregularly shaped flat objects to a goal position by pushing them. The objects that they consider are chosen to exhibit quasi-static properties. Collecting several hundred random example pushes as training data, a forward model is learned using non-parametric regression, similar to the approach taken by Walker and Salisbury (2008).

Figure 8: The example shows the interaction between a 5-axis Katana robotic manipulator and an L-shape object, called polyflap Sloman (2006)

. A set of contact experts are learned as probability densities for encoding geometric relations between parts of objects under a push operation. This approach allows these experts to learn from demonstration physical properties, such as non-penetration between an object and a table top, without explicitly representing physics knowledge in the model. The green wire frame denotes the prediction whilst the red wire frame denotes the visual tracking. Courtesy of 

(Kopicki et al., 2011).

Kopicki et al. (2011) presented two data-driven probabilistic for predicting 3D motion of rigid bodies interacting under the quasi-static assumption. First, they formulated the problem as regression and subsequently as density estimation. In Kopicki et al. (2017) they extended this work further. Their architecture is modular in that multiple object- and context-specific forward models are learned which represent different constraints on the object’s motion. A product of experts is used which, contrary to mixture models, does not add but multiply different densities. Hence, all constraints, e.g. those imposed by the robot-object contact and multiple object-environment contacts, need to be satisfied so that a resulting object motion is considered probable. This formulation facilitates the transfer of learned motion models to objects of novel shape and to novel actions. In experiments with a robot arm, the method is compared with and found to outperform the physics engine PhysX tuned on the same data. For learning and prediction, their algorithms require access to a point cloud of the object. A further extension of this approach is presented in Stüber et al. (2018). In this work, the authors aim to contribute to endowing robots with versatile non-prehensile manipulation skills. To that end, an efficient data-driven approach to transfer learning for robotic push manipulation is proposed. This approach combines and extends two separate strings of research, one directly concerning pushing manipulation (Kopicki et al., 2017), and one originating from grasping research (Kopicki et al., 2016). The key idea is to learn motion models for robotic pushing that encode knowledge specific to a given type of contact (see the work in (Kopicki et al., 2016) for further details). In an previously unseen situation, when the robot needs to push a novel object, the system first establishes how to create a contact with the object’s surface. Such a contact is selected among the learned models, e.g. a flat contact with a cube side or a contact with a cylindrical surface. At the generated contact, the system then applies the appropriate motion model for prediction, similarly to (Kopicki et al., 2017). The underlying rationale for this approach to prediction is that predicting on familiar ground reduces the motion models’ sample complexity while using local contact information for prediction increases their transferability.

Meriçli et al. (2015) similarly presented a case-based approach to push-manipulation prediction and planning. Based on experience from self-exploration or demonstration, a robot learns multiple discrete probabilistic motion models for pushing complex 3D objects on caster wheels with a mobile base in cluttered environments. Subsequently, the case models are used for planning pushes to navigate an object to a goal state whilst potentially pushing movable obstacles out of the way. In the process, the robot continues to observe the results of its actions and feeds that data back into the case models, allowing them to improve and adapt.

Figure 9: Top: graphical representation of the feature-based predictors for push operations. The global motion of the object after a push is described by the rigid body transformation . This transformation is unknown to the robot. However, the robot can estimate it by learning a set of local predictors for the motions and , for . The rigid body transformations and describe the estimated contacts on the object’s surface w.r.t. the estimated global frame of the object, . Since the object is assumed to be rigid, this relation does not change over time, thus once the local motions and are estimated, can be estimated by using the relations and . Bottom: resulting predictions. initial object pose (green, in contact with robot), true final object pose (green, displaced), and predictions (blue). Courtesy of Stüber et al. (2018).

6.3 Deep Learning Approaches

Artificial neural networks

have been used in robotic pushing to estimate physical parameters, predict the outcome of pushing actions, and for planning and control. Previously, we have seen work concerned with estimating physical parameters of the environment from data. Deep learning has been used to address the same problem. Denil et al. (2016) studied the learning of physical properties such as mass and cohesion of objects in a simulated environment. Using deep reinforcement learning, their robots learn different strategies that balance the cost of gathering information against the cost of inaccurate estimation.

Instead of explicitly estimating physical parameters, another approach is learning a dynamics model. Several studies have investigated learning general physical models or ”physical intuition” directly from image data. Chang et al. (2016) presented the Neural Physics Engine, a deep learning framework for learning simple physics simulators. They factorise the environment into object-based representations and decompose dynamics into pairwise interaction between objects. However, their evaluation is limited to simple rigid body dynamics in 2D.

Watters et al. (2017)

introduced the Visual Interaction Network, a model for learning the dynamics of a physical system from raw visual observations. First, a convolutional neural network (CNN) generates a factored object representation from visual input. Then, a dynamics predictor based on interaction networks computes predicted trajectories of arbitrary length. They report accurate predictions of trajectories for several hundred time steps using only six input video frames. Yet, their experiments are also limited to rather simple environments, namely 2D simulations of coloured objects on natural-image backgrounds. Similarly,

Fragkiadaki et al. (2015) also used an object-centric formulation based on raw visual input for dynamics prediction. Based on object-centric visual glimpses (snippets of an image), the system predicts future states by individually modelling the behaviour of each object. After training in different environments by means of random interaction, they also use their model for planning actions in novel environments. Again, they consider simple 2D worlds, in this case moving balls on a 2D table, i.e. their agent plays 2D billiards. Ehrhardt et al. (2017) constructed a neural network for end-to-end prediction of mechanical phenomena. Their architecture consists of three components: a CNN extracts features from images which are updated by a propagation module, and decoded by an estimation module. What their network outputs is a distribution over outcomes, thus explicitly modelling the inherent uncertainty in manipulation prediction. In terms of experiments, they study the relatively simple problem of a small object sliding down an inclined plane.

Figure 10: Frame-centric model for motion prediction of billiard balls. The model takes as input the 2D image of the billiard and the forces applied by the agent to make predictions about the future configurations of the balls. Reproduced from Fragkiadaki et al. (2015).

Moving towards more complex scenarios, Byravan and Fox (2016) introduced SE3-NETS, a deep neural network architecture for predicting 3D rigid body motions. Instead of RGB images, their network takes depth images as input, together with continuous action vectors, and associations between points in subsequent images. SE3-NETS segment point clouds into object parts and predict their motion in the form of transformations. They report that their method outperforms flow-based networks on simulated depth data of a tabletop manipulation scenario. Furthermore, they demonstrate that it performs well on real depth images of a Baxter robot pushing objects. However, their approach requires that associations between depth points are provided. They aim to learn those automatically in future work and to apply SE3-NETS to non-rigid body motion, recurrent prediction, and control tasks. A different approach to learning dynamics from images was taken by Agrawal et al. (2016). They jointly learn forward and inverse models of dynamics of robotic arm operation that can be used for poking objects. In doing so, they extract features from raw images and make predictions in that feature space. In real-world experiments with Baxter, their model is used to move objects to target locations by poking. In order to cope with the real world, their model requires training on large amounts of data. By poking different objects for over hours, their robot observed more than actions.

Figure 11: Object-centric model for motion prediction of billiard balls. The system predicts the future configurations of the balls by individually modelling the temporal evolution of each ball. In this scenario, predicting the velocities of each ball is sufficient for computing the next configuration of the billiard. Reproduced from Fragkiadaki et al. (2015).

Most of the studies presented this section make use of object-centric representations to model dynamics. Approaches that predict motion without such a representation have been explored, as well. For instance, Finn et al. (2016) developed an action-conditioned video prediction system which predicts a distribution over pixel motions only based on previous frames. No information concerning object appearance is provided to the model. It borrows that information from previous frames and merges it with model predictions. It is this mechanism that allows the model to generalise to previously unseen objects. By conditioning predictions on an action, the model can effectively imagine the action’s consequences. As with previously presented deep learning models, this approach also requires large amounts of data to perform well in real-world situations. Hence, the authors have collected a dataset of robot pushing motions (frames associated with the action being applied) on different objects. While their results demonstrate that no object-centric representation is required for prediction, the authors argue that such representations are a promising direction for research as they provide concise state representations for use in reinforcement learning.

We have seen how artificial neural networks can be used to model the dynamics of physical systems. In addition to that, deep reinforcement learning has been used to learn control policies in the field of robotic pushing. Many of those approaches make use of dynamics models so that they can be seen as complementary to the work presented before. We do not provide a detailed review of this very active field here, rather referring the reader to Levine et al. (2015), Levine et al. (2016), Levine and Finn (2017), and Ghadirzadeh et al. (2017) for overviews of such work.

7 Final Remarks

In this paper we have provided an overview of the problem of robot pushing and summarised the development of the state-of-the-art, focusing on the problem of motion prediction of the object to be pushed. We have also covered some aspects of relevant applications of pushing for planning and control.

Typical approaches have been classified as i) purely analytical, ii) hybrid, iii) dynamic analysis, iv) physics engines based, v) data-driven, and vi) deep learning. Representative work for each of these categories has been listed for readers to have a general overview of the field and its state-of-the-art from the earlier work in the 1980s to the most recent approaches.

A set of assumptions in the proposed methods have been highlighted. Earlier work has mostly investigated motion prediction with the quasi-static assumption to get rid of complex dynamics and provided the groundwork to understand the mechanics for pushing 2D shapes Mason (1986b). This seminal work has been extensively extended to more realistic scenarios involving 3D object to be pushed. Nonetheless, as we have seen there are two types of uncertainty that affect manipulation problems: i) prediction uncertainty and ii) state uncertainty. The majorities of the papers that investigated the extension to 3D object commonly relied on the assumption that the geometrical properties of the object to be pushed were known a priori, e.g.Mason (1990); Mayeda and Wakatsuki (1991). Key physical properties that would affect the prediction, e.g mass distribution or friction coefficients, were typically assumed to be known or possible to estimate on the fly, as in Yoshikawa and Kurisu (1991), by combining data-driven methods to the analytical mechanics of pushing.

More recently, a few efforts were made towards robot pushers that can also deal with state uncertainty. By relaxing the assumption that the model of the object to be pushed is known, the robot typically perceives the object as a point cloud or RGB image to estimate the geometric properties, such as pose and shape, before even attempt to make a motion prediction, see Fragkiadaki et al. (2015); Stüber et al. (2018).

Two strands of approaches can be identified. First, the data-driven approach that attempts to learn from experience how an object behaves under a push operation. Qualitative models have investigated the concept of affordances for learning a mapping between object features and possible push actions, which they then employ for planning, e.g. Zrimec and Mowforth (1991). In contrast, metrically precise models have investigated how to learn a mapping between actions and its effects, e.g. (Kopicki et al., 2017). A second more recent strand is the application of deep learning techniques to learn a physical intuition of the mechanics of pushing from visual data, see Fragkiadaki et al. (2015). Both strands typically model the predictions in a probability framework to estimate the most likely action’s outcome given the information available, e.g. image of the scene, contact models. The latter approach is very promising, but it requires a massive amount of data for the model to learn. Hence, such approaches are typically relying on synthetic data from physics engines. In contrast, the work by Stüber et al. (2018); Meriçli et al. (2015) have demonstrated that it is possible to learn motion prediction for complex push operations efficiently and generalise the model’s predictions to previously unseen objects.

While some typical problems still require a better solution, new challenges and requirements are emerging in the field. To make pushing an essential motor primitive in practical robotics, the challenges are either currently under investigation in research group worldwide or need to be investigated in the future. Following we list some suggested trends of open problems that we have identified.

7.1 Understanding and Semantic Representation

The scene is typically perceived as an RGB image or a point cloud. However, for robot pushing, we need to be able to identify pushable objects from static ones. Labelling can be done but it is very expensive in terms of human labour. Converting from source image data to geometrical shapes, and from geometrical shape to semantic representation will be beneficial for the robot. Once the robot can identify probable dynamic objects it would be able to interact with the environment prioritising those objects and improving its understanding.

7.2 Sensory Fusion and Feedback

Multiple sensor inputs are nowadays available for robotic system. Instead of solely relying on vision, other sources of information should be used to close the loop of the manipulation. Tactile, proprioception, and visual feedback should be fused together to enable the robot to perform complex manipulation and recover from failures.

7.3 Explicitly Modelling Uncertainty in the Model

Due to a lack of perfect perception abilities, it is not unusual that robot has to operate with an incomplete description of their environment. In robot pushing, but more in general in the problem of manipulation, the robot needs to generate a set of contacts to interact with other objects. When the pose of the object to be manipulated is unknown, what is the best way to create a robust set of contacts? In the case of planning for dexterous manipulation, our previous work in Zito et al. (2013) has demonstrated that approaching directions that maximise the likelihood of gathering (tactile) information are more likely to achieve a successful set of contacts for a grasp. This was tested in the case when due to imperfect perception abilities the description of the object to be grasped results incomplete and hence the pose of the object is uncertain. This empirically suggests that reasoning about the uncertainty leads to more robust reach-to-grasp trajectories with respect to object-pose uncertainty. Similarly, selecting an action for physical effects (e.g. pushing, push and grasp) should benefit from incorporating state uncertainty with respect to the initial pose estimate of the object.

7.4 Cooperative Robots and Multiple Contacts Pushing

In warehouses, for example, exists the problem of moving large-scale objects. Collaborative robots may be able to complete the task. Besides the problem of sharing sensitive information between them and coordinate the efforts, a new challenge arise from the manipulation point of view. Multiple contacts pushing is hard to predict, especially when the actions are carried by multiple agents. Control and decision making is a critical issue in such systems.

7.5 Real-world Applications

Although the theory behind motion prediction is well-established and applications to simple, structured scenarios have been made, the combination of the existing methods with any industrial applications has not been achieved yet. Robots in warehouses can navigate freely and deliver goods, however, no robotic system is capable of exploiting pushing operations to perform tasks such as inserting a box onto an over-the-head shelf. Theoretical solutions are rarely reliable in practical engineering applications, hence many sophisticated practical approaches will be needed in the future.

Author Contributions

JS is the main author of this paper and collected the literature. CZ is the leading supervisor of this work and he has co-written the paper. RS has co-supervised and funded this project.


This work was supported by UK Engineering and Physical Sciences Research Council (EPSRC No. EP/R02572X/1) for the National Centre for Nuclear Robotics (NCNR).


  • Agarwal et al. (1997) Agarwal, P. K., Latombe, J. C., Motwani, R., and Raghavan, P. (1997). Nonholonomic path planning for pushing a disk among obstacles. In Proceedings of International Conference on Robotics and Automation. vol. 4, 3124–3129 vol.4. doi:10.1109/ROBOT.1997.606763
  • Agrawal et al. (2016) Agrawal, P., Nair, A., Abbeel, P., Malik, J., and Levine, S. (2016). Learning to poke by poking: Experiential learning of intuitive physics. CoRR abs/1606.07419
  • Şahin et al. (2007) Şahin, E., Çakmak, M., Doğar, M. R., Uğur, E., and Üçoluk, G. (2007). To afford or not to afford: A new formalization of affordances toward affordance-based robot control. Adaptive Behavior 15, 447–472
  • Akella and Mason (1992) Akella, S. and Mason, M. T. (1992). Posing polygonal objects in the plane by pushing. In Proceedings 1992 IEEE International Conference on Robotics and Automation. 2255–2262 vol.3. doi:10.1109/ROBOT.1992.219923
  • Akella and Mason (1998) Akella, S. and Mason, M. T. (1998). Posing polygonal objects in the plane by pushing. The International Journal of Robotics Research 17, 70–88
  • Alexander and Maddocks (1993) Alexander, J. and Maddocks, J. (1993). Bounds on the friction-dominated motion of a pushed object. The International journal of robotics research 12, 231–248
  • Bauzá and Rodriguez (2017) Bauzá, M. and Rodriguez, A. (2017). A probabilistic data-driven model for planar pushing. CoRR abs/1704.03033
  • Behrens (2013) Behrens, M. J. (2013). Robotic manipulation by pushing at a single point with constant velocity: Modeling and techniques. Ph.D. thesis, University of Technology, Sydney
  • Brost (1988) Brost, R. C. (1988). Automatic grasp planning in the presence of uncertainty. The International Journal of Robotics Research 7, 3–17
  • Brost (1992) Brost, R. C. (1992). Dynamic analysis of planar manipulation tasks. In Proceedings 1992 IEEE International Conference on Robotics and Automation. 2247–2254 vol.3. doi:10.1109/ROBOT.1992.219924
  • Byravan and Fox (2016) Byravan, A. and Fox, D. (2016). Se3-nets: Learning rigid body motion using deep neural networks. CoRR abs/1606.02378
  • Cappelleri et al. (2006) Cappelleri, D. J., Fink, J., Mukundakrishnan, B., Kumar, V., and Trinkle, J. C. (2006). Designing open-loop plans for planar micro-manipulation. In Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006. 637–642. doi:10.1109/ROBOT.2006.1641782
  • Chang et al. (2016) Chang, M. B., Ullman, T., Torralba, A., and Tenenbaum, J. B. (2016). A compositional object-based approach to learning physical dynamics. CoRR abs/1612.00341
  • Chavan-Dafle and Rodriguez (2015) Chavan-Dafle, N. and Rodriguez, A. (2015). Prehensile pushing: In-hand manipulation with push-primitives. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 6215–6222. doi:10.1109/IROS.2015.7354264
  • Chung and Pollard (2016) Chung, S.-J. and Pollard, N. (2016). Predictable behavior during contact simulation: a comparison of selected physics engines. Computer Animation and Virtual Worlds 27, 262–270. doi:10.1002/cav.1712. Cav.1712
  • Cosgun et al. (2011) Cosgun, A., Hermans, T., Emeli, V., and Stilman, M. (2011). Push planning for object placement on cluttered table surfaces. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. 4627–4632. doi:10.1109/IROS.2011.6094737
  • de Berg and Gerrits (2010) de Berg, M. and Gerrits, D. H. P. (2010). Computing push plans for disk-shaped robots. In 2010 IEEE International Conference on Robotics and Automation. 4487–4492. doi:10.1109/ROBOT.2010.5509937
  • Denil et al. (2016) Denil, M., Agrawal, P., Kulkarni, T. D., Erez, T., Battaglia, P., and de Freitas, N. (2016). Learning to perform physics experiments via deep reinforcement learning. arXiv preprint arXiv:1611.01843
  • Dogar and Srinivasa (2011) Dogar, M. and Srinivasa, S. (2011). A framework for push-grasping in clutter. Robotics: Science and systems VII 1
  • Dogar and Srinivasa (2010) Dogar, M. R. and Srinivasa, S. S. (2010). Push-grasping with dexterous hands: Mechanics and a method. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. 2123–2130. doi:10.1109/IROS.2010.5652970
  • Ehrhardt et al. (2017) Ehrhardt, S., Monszpart, A., Mitra, N. J., and Vedaldi, A. (2017). Learning A physical long-term predictor. CoRR abs/1703.00247
  • Emery and Balch (2001) Emery, R. and Balch, T. (2001). Behavior-based control of a non-holonomic robot in pushing tasks. In Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164). vol. 3, 2381–2388 vol.3. doi:10.1109/ROBOT.2001.932978
  • Erez et al. (2015) Erez, T., Tassa, Y., and Todorov, E. (2015). Simulation tools for model-based robotics: Comparison of bullet, havok, mujoco, ode and physx. In 2015 IEEE International Conference on Robotics and Automation (ICRA). 4397–4404. doi:10.1109/ICRA.2015.7139807
  • Ferguson et al. (2004) Ferguson, D., Morris, A., Haehnel, D., Baker, C., Omohundro, Z., Reverte, C., et al. (2004). An autonomous robotic system for mapping abandoned mines. In Advances in Neural Information Processing Systems. 587–594
  • Finn et al. (2016) Finn, C., Goodfellow, I., and Levine, S. (2016). Unsupervised learning for physical interaction through video prediction. In Advances in Neural Information Processing Systems. 64–72
  • Fragkiadaki et al. (2015) Fragkiadaki, K., Agrawal, P., Levine, S., and Malik, J. (2015). Learning visual predictive models of physics for playing billiards. CoRR abs/1511.07404
  • Ghadirzadeh et al. (2017) Ghadirzadeh, A., Maki, A., Kragic, D., and Björkman, M. (2017). Deep predictive policy training using reinforcement learning. CoRR abs/1703.00727
  • Gibson (1979) Gibson, J. J. (1979). The Ecological Approach to Visual Perception (Houghton Mifflin)
  • Goyal et al. (1991) Goyal, S., Ruina, A., and Papadopoulos, J. (1991).

    Planar sliding with dry friction part 1. limit surface and moment function.

    Wear 143, 307–330
  • Hermans et al. (2013) Hermans, T., Li, F., Rehg, J. M., and Bobick, A. F. (2013). Learning contact locations for pushing and orienting unknown objects. In 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids). 435–442. doi:10.1109/HUMANOIDS.2013.7030011
  • Hogan et al. (2018) Hogan, F. R., Bauzá, M., and Rodriguez, A. (2018). A data-efficient approach to precise and controlled pushing. CoRR abs/1807.09904
  • Howe and Cutkosky (1996) Howe, R. D. and Cutkosky, M. R. (1996). Practical force-motion models for sliding manipulation. The International Journal of Robotics Research 15, 557–572
  • Igarashi et al. (2010) Igarashi, T., Kamiyama, Y., and Inami, M. (2010). A dipole field for object delivery by pushing on a flat surface. In 2010 IEEE International Conference on Robotics and Automation. 5114–5119. doi:10.1109/ROBOT.2010.5509483
  • Jia and Erdmann (1999) Jia, Y.-B. and Erdmann, M. (1999). Pose and motion from contact. The International Journal of Robotics Research 18, 466–487
  • Khatib (1986) Khatib, O. (1986). Real-time obstacle avoidance for manipulators and mobile robots. The international journal of robotics research 5, 90–98
  • King (2016) King, J. E. (2016). Robust Rearrangement Planning Using Nonprehensile Interaction. Ph.D. thesis, Carnegie Mellon University
  • King et al. (2013) King, J. E., Klingensmith, M., Dellin, C. M., Dogar, M. R., Velagapudi, P., Pollard, N. S., et al. (2013). Pregrasp manipulation as trajectory optimization. In Robotics: Science and Systems
  • Kolbert et al. (2017) Kolbert, R., Chavan-Dafle, N., and Rodriguez, A. (2017). Experimental Validation of Contact Dynamics for In-Hand Manipulation (Cham: Springer International Publishing). 633–645
  • Kopicki et al. (2016) Kopicki, M., Detry, R., Adjigble, M., Stolkin, R., Leonardis, A., and Wyatt, J. L. (2016). One-shot learning and generation of dexterous grasps for novel objects. The International Journal of Robotics Research 35, 959–976
  • Kopicki et al. (2017) Kopicki, M., Zurek, S., Stolkin, R., Moerwald, T., and Wyatt, J. L. (2017). Learning modular and transferable forward models of the motions of push manipulated objects. Autonomous Robots 41, 1061–1082
  • Kopicki et al. (2011) Kopicki, M., Zurek, S., Stolkin, R., Mörwald, T., and Wyatt, J. (2011). Learning to predict how rigid objects behave under simple manipulation. In 2011 IEEE International Conference on Robotics and Automation. 5722–5729. doi:10.1109/ICRA.2011.5980295
  • Kurisu and Yoshikawa (1995) Kurisu, M. and Yoshikawa, T. (1995). Trajectory planning for an object in pushing operation. Journal of the Robotics Society of Japan 13, 1115–1121
  • Lau et al. (2011) Lau, M., Mitani, J., and Igarashi, T. (2011). Automatic learning of pushing strategy for delivery of irregular-shaped objects. In 2011 IEEE International Conference on Robotics and Automation. 3733–3738. doi:10.1109/ICRA.2011.5979740
  • LaValle (1998) LaValle, S. (1998). Rapidly-exploring random trees: A new tool for path planning. Tech. rep., CS Dept, Iowa State University
  • Lee et al. (2015) Lee, G., Lozano-Pérez, T., and Kaelbling, L. P. (2015). Hierarchical planning for multi-contact non-prehensile manipulation. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 264–271
  • Lee and Cutkosky (1991) Lee, S. H. and Cutkosky, M. R. (1991). Fixture planning with friction. Journal of Manufacturing Science and Engineering 113, 320–327
  • Levine and Finn (2017) Levine, S. and Finn, C. (2017). Deep visual foresight for planning robot motion. ICRA
  • Levine et al. (2016) Levine, S., Finn, C., Darrell, T., and Abbeel, P. (2016). End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17, 1334–1373
  • Levine et al. (2015) Levine, S., Wagener, N., and Abbeel, P. (2015). Learning contact-rich manipulation skills with guided policy search. In 2015 IEEE International Conference on Robotics and Automation (ICRA). 156–163
  • Lynch (1993) Lynch, K. M. (1993). Estimating the friction parameters of pushed objects. In Intelligent Robots and Systems ’93, IROS ’93. Proceedings of the 1993 IEEE/RSJ International Conference on. vol. 1, 186–193 vol.1
  • Lynch et al. (1992) Lynch, K. M., Maekawa, H., and Tanie, K. (1992). Manipulation and active sensing by pushing using tactile feedback. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. vol. 1, 416–421. doi:10.1109/IROS.1992.587370
  • Lynch and Mason (1996) Lynch, K. M. and Mason, M. T. (1996). Stable pushing: Mechanics, controllability, and planning. The International Journal of Robotics Research 15, 533–556
  • Mason (1982) Mason, M. T. (1982). Manipulator grasping and pushing operations. Ph.D. thesis, Massachusetts Institute of Technology
  • Mason (1986a) Mason, M. T. (1986a). Mechanics and planning of manipulator pushing operations. The International Journal of Robotics Research 5, 53–71
  • Mason (1986b) Mason, M. T. (1986b). On the scope of quasi-static pushing. In International Symposium on Robotics Research (MIT Press), 229–233
  • Mason (1990) Mason, M. T. (1990). Compliant sliding of a block along a wall (Berlin, Heidelberg: Springer Berlin Heidelberg). 568–578. doi:10.1007/BFb0042542
  • Mayeda and Wakatsuki (1991) Mayeda, H. and Wakatsuki, Y. (1991). Strategies for pushing a 3d block along a wall. In Intelligent Robots and Systems ’91. ’Intelligence for Mechanical Systems, Proceedings IROS ’91. IEEE/RSJ International Workshop on. 461–466 vol.2. doi:10.1109/IROS.1991.174512
  • Meriçli et al. (2015) Meriçli, T., Veloso, M., and Akın, H. L. (2015). Push-manipulation of complex passive mobile objects using experimentally acquired motion models. Autonomous Robots 38, 317–329
  • Min et al. (2016) Min, H., Yi, C., Luo, R., Zhu, J., and Bi, S. (2016). Affordance research in developmental robotics: A survey. IEEE Transactions on Cognitive and Developmental Systems 8, 237–255
  • Miyazawa et al. (2005) Miyazawa, K., Maeda, Y., and Arai, T. (2005). Planning of graspless manipulation based on rapidly-exploring random trees. In (ISATP 2005). The 6th IEEE International Symposium on Assembly and Task Planning: From Nano to Macro Assembly and Manufacturing, 2005. 7–12
  • Moldovan et al. (2012) Moldovan, B., Moreno, P., van Otterlo, M., Santos-Victor, J., and De Raedt, L. (2012). Learning relational affordance models for robots in multi-object manipulation tasks. In Robotics and Automation (ICRA), 2012 IEEE International Conference on (IEEE), 4373–4378
  • Narasimhan (1994) Narasimhan, S. (1994). Task level strategies for robots. Ph.D. thesis, Massachusetts Institute of Technology
  • Nieuwenhuisen et al. (2005) Nieuwenhuisen, D., van der Stappen, A. F., and Overmars, M. H. (2005). Path planning for pushing a disk using compliance. In 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems. 714–720. doi:10.1109/IROS.2005.1545603
  • Peshkin and Sanderson (1988a) Peshkin, M. A. and Sanderson, A. C. (1988a). The motion of a pushed, sliding workpiece. IEEE Journal on Robotics and Automation 4, 569–598. doi:10.1109/56.9297
  • Peshkin and Sanderson (1988b) Peshkin, M. A. and Sanderson, A. C. (1988b). Planning robotic manipulation strategies for workpieces that slide. IEEE Journal on Robotics and Automation 4, 524–531
  • Ridge et al. (2015) Ridge, B., Leonardis, A., Ude, A., Deniša, M., and Skočaj, D. (2015). Self-supervised online learning of basic object push affordances. International Journal of Advanced Robotic Systems 12, 24
  • Ruiz-Ugalde et al. (2010) Ruiz-Ugalde, F., Cheng, G., and Beetz, M. (2010). Prediction of action outcomes using an object model. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. 1708–1713. doi:10.1109/IROS.2010.5649552
  • Ruiz-Ugalde et al. (2011) Ruiz-Ugalde, F., Cheng, G., and Beetz, M. (2011). Fast adaptation for effect-aware pushing. In 2011 11th IEEE-RAS International Conference on Humanoid Robots. 614–621. doi:10.1109/Humanoids.2011.6100863
  • Salganicoff et al. (1993) Salganicoff, M., Metta, G., Oddera, A., Sandini, G., Salganico, M., Metta, G., et al. (1993). A vision-based learning method for pushing manipulation. In

    AAAI Fall Symposium Series on Machine Learning in Vision: What Why and How?

  • Scholz et al. (2014) Scholz, J., Levihn, M., Isbell, C., and Wingate, D. (2014). A physics-based model prior for object-oriented mdps. In Proceedings of the 31st International Conference on Machine Learning (ICML-14). 1089–1097
  • Sloman (2006) Sloman, A. (2006). Polyaps as a domain for perceiving, acting and learning in a 3-d world. In Position Papers for 2006 AAAI Fellows Symposium (AAAI)
  • Stüber et al. (2018) Stüber, J., Kopicki, M., and Zito, C. (2018). Feature-based transfer learning for robotic push manipulation. In Proceeding of IEEE International Conference on Robotics and Automation (ICRA)
  • Ugur et al. (2011) Ugur, E., Oztop, E., and Sahin, E. (2011). Goal emulation and planning in perceptual space using learned affordances. Robotics and Autonomous Systems 59, 580 – 595
  • Walker and Salisbury (2008) Walker, S. and Salisbury, J. K. (2008). Pushing using learned manipulation maps. In 2008 IEEE International Conference on Robotics and Automation. 3808–3813. doi:10.1109/ROBOT.2008.4543795
  • Watters et al. (2017) Watters, N., Tacchetti, A., Weber, T., Pascanu, R., Battaglia, P., and Zoran, D. (2017). Visual interaction networks. arXiv preprint arXiv:1706.01433
  • Yoshikawa and Kurisu (1991) Yoshikawa, T. and Kurisu, M. (1991). Indentification of the center of friction from pushing an object by a mobile robot. In Intelligent Robots and Systems ’91. ’Intelligence for Mechanical Systems, Proceedings IROS ’91. IEEE/RSJ International Workshop on. 449–454 vol.2
  • Yu et al. (2016) Yu, K. T., Bauza, M., Fazeli, N., and Rodriguez, A. (2016). More than a million ways to be pushed. a high-fidelity experimental dataset of planar pushing. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 30–37. doi:10.1109/IROS.2016.7758091
  • Zhou et al. (2017) Zhou, J., Bagnell, J. A., and Mason, M. T. (2017). A fast stochastic contact model for planar pushing and grasping: Theory and experimental validation. arXiv preprint arXiv:1705.10664
  • Zhou et al. (2016) Zhou, J., Paolini, R., Bagnell, J. A., and Mason, M. T. (2016). A convex polynomial force-motion model for planar sliding: Identification and application. In 2016 IEEE International Conference on Robotics and Automation (ICRA). 372–377. doi:10.1109/ICRA.2016.7487155
  • Zhu et al. (2017) Zhu, S., Kimmel, A., and Boularias, A. (2017). Information-theoretic model identification and policy search using physics engines with application to robotic manipulation. CoRR abs/1703.07822
  • Zito et al. (2013) Zito, C., Kopicki, M., Stolkin, R., Borst, C., Schmidt, F., Roa, M. A., et al. (2013). Sequential trajectory re-planning with tactile information gain for dextrous grasping under object-pose uncertainty. In Proceeding of IEEE International Conference on Intelligent Robots and Systems (IROS). 2013–2040
  • Zito et al. (2012) Zito, C., Stolkin, R., Kopicki, M., and Wyatt, J. L. (2012). Two-level rrt planning for robotic push manipulation. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. 678–685
  • Zrimec and Mowforth (1991) Zrimec, T. and Mowforth, P. (1991). Learning by an autonomous agent in the pushing domain. Robotics and Autonomous Systems 8, 19 – 29. Special Issue Toward Learning Robots