An Adaptive Framework for Reliable Trajectory Following in Changing-Contact Robot Manipulation Tasks

11/15/2021
by   Saif Sidhik, et al.
University of Birmingham
0

We describe a framework for changing-contact robot manipulation tasks that require the robot to make and break contacts with objects and surfaces. The discontinuous interaction dynamics of such tasks make it difficult to construct and use a single dynamics model or control strategy, and the highly non-linear nature of the dynamics during contact changes can be damaging to the robot and the objects. We present an adaptive control framework that enables the robot to incrementally learn to predict contact changes in a changing contact task, learn the interaction dynamics of the piece-wise continuous system, and provide smooth and accurate trajectory tracking using a task-space variable impedance controller. We experimentally compare the performance of our framework against that of representative control methods to establish that the adaptive control and incremental learning components of our framework are needed to achieve smooth control in the presence of discontinuous dynamics in changing-contact robot manipulation tasks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 12

page 15

06/21/2021

Towards a Framework for Changing-Contact Robot Manipulation

Many robot manipulation tasks require the robot to make and break contac...
09/23/2020

ContactNets: Learning of Discontinuous Contact Dynamics with Smooth, Implicit Representations

Common methods for learning robot dynamics assume motion is continuous, ...
03/29/2021

Fundamental Challenges in Deep Learning for Stiff Contact Dynamics

Frictional contact has been extensively studied as the core underlying b...
11/01/2021

Safe Online Gain Optimization for Variable Impedance Control

Smooth behaviors are preferable for many contact-rich manipulation tasks...
02/05/2015

A Hybrid Systems Model for Simple Manipulation and Self-Manipulation Systems

Rigid bodies, plastic impact, persistent contact, Coulomb friction, and ...
10/23/2020

TAMPC: A Controller for Escaping Traps in Novel Environments

We propose an approach to online model adaptation and control in the cha...
03/11/2018

Experience Recommendation for Long Term Safe Learning-based Model Predictive Control in Changing Operating Conditions

Learning has propelled the cutting edge of performance in robotic contro...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Consider the robot manipulator in Figure 0(a) that has to move its end-effector along a motion pattern that requires it to make and break contact with objects and surfaces. This task’s dynamics, i.e., the relationship between the forces acting on the robot and the resultant accelerations, vary markedly before and after contact (e.g. after motion ‘1’ in the figure). They also vary based on other factors such as type of contact, surface friction (e.g. in the middle of motions ‘2’ and ‘4’), and applied force. Many industrial assembly tasks, e.g., peg insertion, screwing and stacking, and many human manipulation tasks are instances of such ‘changing-contact’ tasks. The interaction dynamics of the robot performing these tasks are discontinuous when a contact is made or broken and continuous elsewhere, making it difficult to construct a single dynamics model. It is possible to construct a hybrid model with separate continuous dynamics and distinct control laws within each of a number of discrete dynamic modes; the overall dynamics are then piece-wise continuous, with the robot transitioning between modes as needed (Kroemer et al., 2019). Even then, the highly non-linear nature of such interactions results in large discontinuities in the overall motion dynamics of the robot at the regions of transition, in the form of high forces, jerk, and vibrations. These sudden and uncontrolled system disturbances during contact change instances such as collisions can damage the robot and/or objects involved. Therefore, the robot also has to learn to handle these discontinuities such that the overall motion is smooth and safe.

An ideal and efficient control framework for executing a provided manipulation task plan should therefore be able to follow the provided plan accurately while ensuring that: (i) it uses low stiffness parameters whenever possible so as to be compliant and expend less energy; (ii) it can handle continuously changing interaction dynamics that arise when the robot is in continuous contact with an object/environment as part of the task; (iii) it can handle the piecewise continuous nature of interaction dynamics that occur due to discrete changes in the interacting environment or due to making/breaking of contacts; (iv) it can quickly learn to deal with previously unseen environments (interaction dynamics) and use an appropriate adaptive controller for navigating it; (v) it can learn to anticipate collisions and contact changes in the task, and be able to learn and switch to a control strategy that produce reduced impact forces and vibrations at these transitions, (vi) the robot should require as few trials as possible to learn to perform the task efficiently (where ‘efficiency’ is defined in terms of trajectory-tracking accuracy, controller stiffness, delays in task-completion, low impact forces, and overall smoothness of motion).

(a) Changing-contact task.
(b) 3D simulation setup.
(c) 2D test scenario.
Figure 1: (a) Changing-contact task where the robot experiences discontinuities in dynamics due to different surface friction in the middle of motions “2” and “4”, and due to collisions at the end of “1”, “2” and “4”; (b) Simulated environment used in experiments in Section 4.4; robot approaches table from top, slides end-effector (green) along table until it collides with wall, and slides along wall and table; (c) 2D multi-spring environment with the robot lagging behind target using AIC (used in Sections 4.1 and 4.2): red block is robot end-effector, green lines are springs attached to end-effector.

This paper describes an online adaptive framework for smooth control of changing-contact robot manipulation tasks that tries to satisfy these conditions. It unifies our work on a variable impedance controller for continuous contact tasks (Mathew et al., 2019), an extension for piecewise-continuous dynamics due to contact changes (Sidhik et al., 2020), and a recent study which explored a contact-change-handling module for dealing with the discontinuities in dynamics during contact changes (Sidhik et al., 2021). We advocate the need for a hybrid model with one or more dynamics modes, each modeled for a contact mode with a predictive (forward) dynamics model of sensor measurements, a control law, and a relevance condition. The choice of representation for learning the hybrid models enables efficient adaptation of the dynamics model of each mode and makes the identification of modes independent of the motion direction and magnitude of applied forces. Our controller within a mode is inspired by research in human motor control, which indicates that people learn to adapt arm stiffness to any new task by building and using internal models of task dynamics with generic (domain/task-independent) and specific (domain/task-dependent) representations to predict hand or object configurations and the forces during task execution (Kawato, 1999).

For sensor measurements and system feedback, we rely purely on the robot’s end-effector poses/velocities and force-torque estimates at the end-effector of the robot. Our framework is capable of using external sensors such as cameras for providing initial estimates of the contact modes and/or contact locations, but assumes these are not always available due to their unreliability in the context of changing-contact manipulation tasks. The motion pattern, predictors, and control laws were defined in Cartesian space to simplify the learning and control problems, as well as to make the framework more generalisable and independent of the robot platform. The contribution of this paper is a study of the unified framework that combines the following three main components:

  1. An adaptive variable impedance control strategy to be used within a continuous contact mode, with the capability to automatically and incrementally adapt the forward model of the mode’s dynamics, which is used to revise the gain (i.e., stiffness) parameters of the control law for accurate and compliant motion in that mode.

  2. A hierarchical model learning approach to model piecewise-continuous dynamics of changing-contact manipulation tasks without prior knowledge of all its modes or the order in which they appear. The approach also includes a mode-detection model that automatically identifies the modes of any given task, and transitions to existing or new modes during task execution.

  3. A simple and efficient contact-change-handling module that can predict and incrementally improve the estimates of contact locations, use the estimates to minimize time spent in the transition phase, and achieve smooth motion dynamics during mode transitions by adapting controller parameters automatically.

The framework is tested in changing-contact manipulation setups such as those shown in Figure 1. Components and features of the framework are evaluated against existing baselines mainly in contained simulated setups for providing noiseless and unbiased results. The adaptive variable impedance control component for continuous dynamics (base controller) is experimentally compared with other adaptive control strategies from literature in a custom-built simulated world with different types of continuously changing environments. Then, the need for an incremental hybrid learning & control strategy for handling discontinuities in dynamics is demonstrated using as baseline a fully offline long-term prediction framework, which is then incrementally modified to show the advantages of certain representational choices and formulation of our hybrid framework. We first review related work in Section 2, describe our framework in Section 3, discuss experimental results in Section 4, and conclude in Section 5.

2 Related work

A robot manipulator typically interacts with the environment through contacts. This can be discrete such as when making/breaking contact (e.g., assembly tasks) or continuous (e.g., polishing, deburring). This section will review some works from literature that attempted to learn such piecewise continuous interactions and developed controllers for handling non-linear systems.

2.1 Forward models in manipulation

Since it is difficult to provide the robot the comprehensive knowledge of task and domain required to perform a desired manipulation task, the robot is often enabled to learn from experience, e.g., create a predictive (forward) model of action effects and use it to select actions (Beetz et al., 2010). Studies in human motor control also indicate the creation and use of forward-models for different control tasks (Flanagan et al., 2003).

Forward models broadly fall into three categories (i) analytic models - which are typically modeled mathematically using Newtonian physics; (ii) learned models - built from data using techniques from machine learning; and (iii) hybrid models - a combination of learned and analytic models. Analytic models are informed by the knowledge of mechanics to make predictions about robot and object motions

(Fan et al., 2017; Liu, 2009), by making assumptions such as quasi-static mechanics, zero slippage and point contacts (Chatterjee, 1999; Fazeli et al., 2020). Hence analytic methods often suffer from the problem of inaccurate prediction. They also often require an explicit representation of its intrinsic parameters, such as friction, mass, mass distribution, and coefficients of restitution, which are not trivial to estimate (Kopicki et al., 2017).

The second approach is to use techniques from machine learning to build forward models of the system. These techniques learn an action-effect correlation either using data obtained from expert demonstrations (Kronander and Billard, 2014; Huang et al., 2016) or from self experience using trials (Levine and Koltun, 2013; Kupcsik et al., 2013). Many methods have been developed to address the learning and control problems in robot manipulation (Kroemer et al., 2019)

, especially methods based on reinforcement learning (RL) 

(Stulp et al., 2012) and those combining deep networks and RL for learning flexible behaviors from complex data (Andrychowicz et al., 2018; Hausman et al., 2018)

. Recently, several attempts have been made to use deep neural networks (DNNs) for end-to-end learning of changing-contact manipulation tasks which bypasses learning a separate policy for modeling dynamics

(Nagabandi et al., 2020; Ajay et al., 2019)

. Although these methods reduce the need for domain models and prior knowledge, they require large labeled datasets, pose high-dimensional optimization problems, and tend to select the smoothest interpolation of the training data, which conflicts with the discontinuous impact dynamics of changing-contact manipulation.

The third approach is to use a combination of both analytical and learning approaches discussed above. These forward models tend to learn the difference between analytical models and the true dynamics of the interactions, capturing the improvements to the analytical model needed to match the observed environment (Gandhi et al. (2017)). Despite its advantage of requiring less training data, these strategies still rely on several restrictive assumptions regarding the type of contacts, friction models, object dynamics, etc., require significant knowledge about the mathematical models to be used, and also require at least a few trials and examples of the real-world possible scenarios to be modeled. Even with sim-to-real strategies that were developed to further reduce the training on real robots, aspects such as the dynamics of rigid bodies with friction are too complicated to be modeled in a real-time dynamics simulator (Johnson et al., 2016), and often require several hundreds of trials in the real world for fine-tuning the models for transferring to a physical robot (Ajay et al., 2019).

The piece-wise continuous nature of changing contact tasks has been used by some methods to build hybrid or hierarchical models, with one level identifying the current “mode” while the other level models the dynamics of that mode (Buşoniu et al., 2018). Planning methods for manipulation often consider the discontinuities but assume prior knowledge of the models and modes, and often require many synthetically-generated training samples (Toussaint et al., 2018). Our framework, on the other hand, incrementally learns the dynamics of the robot’s interaction with its environment as a hierarchical model without prior knowledge (other than the target trajectory) of the dynamics.

2.2 Controllers for continuous contact tasks

In most practical control systems including robot manipulators, the model of the “plant” (robot) is nonlinear or unknown. In case of manipulators, the model sometimes can be unmodelled or difficult to model due to interactions with other objects, unknown tools/objects at the end-effector, etc. This raises the need for control schemes that are capable of handling system uncertainties, or are adaptive and can change according to the system’s response and/or observations (measurements).

Adaptive control is class of controllers that are used for systems whose parameters can change, or are unknown beforehand, using feedback from the system. Common strategies include model-reference adaptive controllers (MRAC), self-tuning regulators, and gain scheduling. MRAC methods compute control signals that forces the system’s behavior to match that of a reference model (Zhang and Wei, 2017). They often employ approximations of the system’s parameters and revise them using prior/observed data, but an incorrect reference model can result in instabilities.

Self-tuning regulators typically model the plant as a linear time-varying system, and adapt or estimate the controller parameters online (Pezzato et al., 2020). Although they can converge to an optimal controller under certain conditions and are a good option for systems whose dynamics vary in a fixed pattern across all trials, they often struggle to adapt to the changing objectives in robot manipulation. In repetitive tasks, however, the last estimates of the parameters from the previous run can be used as the initial estimates, which can then be improved in subsequent trials. Thus, the controller is “trainable” in repetitive tasks, resulting in a more accurate performance for systems. Under such situation, a controller is able to be designed or trained for every single anticipated operating point. This type of control adaptation falls under the next category of adaptive control, called gain-scheduling.

Gain-scheduling methods are a popular option in robotics due to the increasing use of imitation learning and RL methods. These methods learn a time-indexed sequence of control parameters by repeating the task to update gains 

(Kramberger et al., 2018), or by learning gain profiles from training data (Lee et al., 2015)). Many of the variable impedance strategies using reinforcement learning or learning-from-demonstrations can be considered to be in this category (Abu-Dakka et al., 2018; Calinon et al., 2010; Rozo et al., 2016). Variable impedance controllers in general provide a time-varying impedance profile for the robot controller during task execution. The impedance could change as a function of time, robot state, or any observation made by the robot. Since these methods often need large training datasets or comprehensive domain knowledge, more recent methods have the agent repeat a task till some measure of good performance is achieved (Gams et al., 2014; Kramberger et al., 2018). Conventional methods try to reduce trajectory tracking error as well as improve rejection of periodic disturbances. This typically involves learning a corrective term for the control law that linearly depends on tracking error, measured disturbance, and/or time.

The adaptive variable impedance control (AVIC) we developed for navigating each contact mode is (at its core) similar to the self-tuning regulators, but it builds on our prior work in control for continuous-contact tasks (Mathew et al., 2019) and piecewise-continuous interaction dynamics (Sidhik et al., 2020); it uses the prediction error of an incrementally learned forward model to guide the online adaptation of the control parameters.

2.3 Control of hybrid systems

Several control methods have been proposed in the control theory literature for specific classes of hybrid systems. For instance, a wide range of literature can be found for switched systems where stabilizing controllers are developed using Lyapunov arguments (Johansson, 2003; De Schutter et al., 2009). Optimal control theory has been used for developing control approaches for hybrid systems in the context of manufacturing (De Schutter et al., 2009). Recently, model predictive control (MPC) has also been extended to some classes of hybrid systems (De Schutter et al., 2009).

Benefits of incorporating modes or phases in the design of controllers for manipulation is evident in grasping tasks (e.g.  (Romano et al., 2011)) where the controller has distinct objectives and behavior requirements for phases such as ‘approach’, ‘grasp’ and ‘release’. Different strategies for sequencing motion primitives have also been used to solve manipulation tasks, but they assume the existence of a library of modes or motion primitives, or alternatively segment a sequence of primitives from human demonstrations (Niekum et al., 2013). These methods do not consider the interaction dynamics or try to reduce effects of impact. Therefore, this type of modeling makes the learned policy dependent on the environment, movements and the sequence of modes.

Motivated from the effectiveness and flaws from the works discussed in the previous sections, our framework uses independent task-space adaptive variable impedance controllers (AVIC) for each identified mode in the task. This makes the framework more generalisable and adaptable to new environments.

2.4 Modeling impacts & contact changes

Collisions and impact dynamics introduce critical challenges to planning, modeling and control of robots in applications such as locomotion (Wieber et al., 2016) and manipulation (Kemp et al., 2007). Even a single collision is a complex interaction where object interpenetration is prevented by material deformation, and which often occurs on a scale far below the resolution of practical sensors (Halm and Posa, 2021). Capturing these processes accurately requires an impractically precise set of knowledge regarding the materials, geometries, and initial conditions, on top of the complications in predicting dynamics using these information (Chatterjee, 1997). To avoid these challenges, most robotics approaches make several coarse approximations of contact dynamics such as the rigid-body assumption to make the problem more tractable (see (Brogliato, 2019) for background). When impacts occur, rigid-body models approximate the event as an instantaneous change in velocity due to an impulsive force. Even so, seemingly minor changes in the mathematical models can result in significantly different predictions from identical initial conditions, and in many cases, they are unable to capture real-world behaviors with any available model (Fazeli et al., 2020; Stoianovici and Hurmuzlu, 1996).

Unrealistic contact models is a primary reason for the gap between simulated and real-world performance in robotics problems (Parmar et al., 2021). Furthermore, velocity measurements are extremely sensitive to the measured time, as they change almost instantaneously during impacts. These issues become even more significant when learning a model of a real system from noisy sensor measurements. The smoothing effect of deep neural networks is therefore particularly harmful for modeling impacts and collisions. This is especially significant due to the sparsity of reliable data points that can be collected during and around impact which is pronounced by the lower reliability of an average sensor in these regions. Parmar et al. (2021)

provides an excellent discussion on the main challenges of using deep learning systems (as well as analytical methods) for modeling contact dynamics, an important one being the degradation of model performance with increasing stiffness.

Learning to model contacts directly is therefore a difficult problem using both analytical and learning-based methods. However, instead of modeling contact dynamics directly, the robot can decouple the problem by first approximating the positions of contact points, and then using a ‘safer’ controller in the predicted regions. Static contact properties such as contact positions and direction of impact can be estimated more reliably (if the task plan is known beforehand) using either tactile/force-torque sensors or coarse depth images/point cloud models of the objects. Acquiring training samples for such measurements are also usually easier than performing an analytical analysis of the interaction. In (Ugur and Piater, 2015), a robot is shown to acquire the training samples’ labels autonomously by interacting with objects to learn high-level rules for the objects that can be used for planning. Such interactive perception techniques have been used in several scenarios such as to estimate constraints or physical properties of objects (Katz and Brock, 2011; Barragän et al., 2014). The benefits of interactive perception are that they do not necessarily need pre-training and they help disambiguate between scenarios as well as in observing otherwise latent properties (Kroemer et al., 2019). For instance, a robot can figure out if an object is fixed or movable by pushing it. However, such methods usually require the robot to perform the task multiple times to build models or optimize model parameters, especially when trying to model dynamics properties of the interaction. In our contact-anticipation model (Section 3.3

), we focus on interactively improving the knowledge of the robot about the locations of the contacts involved in the task using a Kalman-filter based update algorithm. These estimates of the locations of contacts are then used to define ‘transition’ regions in the workspace of the robot, where it can expect contact-changes to occur, and therefore use a safer control strategy in such regions.

2.5 Transition-phase controllers

Using separate controllers is a common strategy for handling contact changes. Methods that use a transition-phase controller for changing-contact manipulation tasks focus on minimizing the discontinuities in the dynamics such that the controller is stable and tracks accurately after transition (Mills and Lokhorst, 1993; Marth et al., 1993; Sidhik et al., 2020). However, these methods generally switch to a different controller only after a contact is detected, which can result in significant disruptions in the dynamics when the switch is made. These transitions results in the intake of large amount of energy into the system large spikes in forces and acceleration, and could potentially damage the robot or the domain objects. In contrast to these methods, our contact-change-handling module aims to predict the contacts and use it to transition to a safer ‘transition-phase’ controller smoothly on-the-fly by interpolating between the control outputs before collision occurs. Hyde et al. (1997) uses a transition-phase controller with constant low velocity to reduce impact effects in the guard regions of manipulation tasks modeled as hybrid systems. However, unlike their approach, our transition-phase controller modifies both velocity and stiffness to reduce impacts and vibrations during transitions, and can also automatically select the approach velocity to attain a desired impact force on collision (Section 3.5.1).

For the least delays and deviation from the original plan, the new approach velocity of the transition-phase controller should come into effect only when the robot is about to make a contact. Modifying the velocity requires modifying the timeline of the provided kinematic plan, hence causing deviation from the original plan (in terms of following the timeline) and sacrificing tracking accuracy. However, motion smoothness (motion derivatives at least up to jerk) can be guaranteed by making the transition trajectory continuous. To make the motion smooth in acceleration and jerk, a motion profile that is at least smooth is required. For kinematic time-optimal motion, different variants of trapezoidal velocity profiles are commonly seen (Biagiotti and Melchiorri, 2008). Several methods that create smooth trajectories using multiple trajectory segments have also been developed (Ahn et al., 2004; Nam and Yang, 2004). Several minimum-jerk motion profiles are also found in literature (Piazzi and Visioli, 2000; Freeman, 2012). A trapezoidal smooth motion profile for point-to-point motion has been described in (Grassmann et al., 2018) using a seventh-order

polynomial function as the velocity profile. Such methods typically have many hyper-parameters that are tuned specifically for the task and requires knowledge on the limits of the system’s jerk and higher order motion derivatives. In contrast, our framework uses a novel velocity profile for switching to the transition-phase controller, which is simpler in formulation with no additional hyperparameters and has continuous derivatives of all orders at every point of the function, making it

smooth (Section 3.5.1).

3 Framework Description

Our overall framework takes as input the task-space target (i.e., desired) motion pattern and force control target profile for the robot to follow. In the absence of our framework, the robot would follow the provided task plan using a position controller, without compensating for the changing interaction dynamics as the task progresses. This means that the robot would have to use a high-stiffness controller within a contact mode so as to counter continuously varying environment dynamics. Similarly, there would be high discontinuities in the motion dynamics when the contact modes change (i.e. when the robot makes/breaks contact or when the environment changes discretely). High impacts during collisions could also damage the robot and/or objects involved. Our framework would ensure that these problems are avoided by modeling the piecewise continuous interaction dynamics during manipulation using a hierarchical learning framework, which uses an adaptive controller within a contact mode capable of modeling and compensating for continuously changing environments. The framework also learns to predict mode changes before they happen such that the robot can switch to a safer transition-phase controller during mode transitions such that the discontinuities during these transitions are reduced.

Figure 2: Overview of the framework for smooth control of changing-contact manipulation tasks.

Figure 2 presents an overview of our framework. The inputs are the desired motion trajectory, the force-torque sensor measurements, and the end-effector position. The default controller is the hybrid force-motion adaptive variable impedance controller (for hybrid dynamical systems) that is described in Sections 3.1 and 3.2. The robot first tries to identify the contact mode that the robot is in and uses the appropriate forward model and controller for navigating that mode. The robot predicts the ‘region of anticipated mode-transitions‘ in the task using the Kalman filter-based contact anticipation model described in Section 3.3. The framework builds on this default controller’s representation, enabling a task-space contact anticipation model that incrementally updates its contact prediction using a Kalman filter. These predictions are used to minimize the time spent in the transition phase, and the controller parameters to be used in the transition phase is set adaptively to achieve a smooth motion profile (and a desired impact force if collisions are expected), as explained in Sections 3.4 and (Section 3.5). Once the transition is completed using a suitable controller to minimize discontinuities, the robot identifies the new mode, uses the appropriate default controller and revises the parameter values suitably.

3.1 Control Strategy and Learning Dynamics in a Mode

For any given task, the desired motion trajectory is provided as a sequence of mappings from time to the end-effector pose and force (for force control). It is obtained through a single demonstration of the task by a human moving the manipulator. Since the controller operates in the Cartesian-space, the trajectory is in the form of segments that are (each) assumed to be smooth, continuous, and jerk-free. Transition between segments is accompanied by a change in the direction of (force, motion) control, and does not account for the collisions (i.e., contact points).

The basic controller in our framework follows the standard impedance control formulation with a force control term, making it a hybrid force-impedance controller:

(1)

where is the robot’s control command (i.e., task space force) at time , term denotes the other dynamics compensation terms (inertia, Coriolis and gravity); and are the (positive definite) stiffness and damping matrices of the feedback controller for motion; is force feedback control command to achieve orthogonal force targets if desired; and and are the errors in the end-effector position and velocity at each instant. In the absence of external disturbances, the feed-forward term can be zero. However, if there are external wrenches acting on the end-effector, a good forward model could be used to provide appropriate values for that can help the robot follow accurately without being affected by those disturbances.

In our formulation, the forward model of each contact mode is learned as the robot attempts to follow

for the given task. To avoid explicit dependence of the forward model on time, a Gaussian Mixture Model (GMM) is fit over points of the form

, where can be any combination of features that uniquely represent the robot’s state for the task, and is interaction effects felt by the robot at its end-effector at time . can contain information about end-effector pose (), velocity (), forces (), etc. while would be measurable interaction effects such as end-effector forces () and torques ().

In our framework, we aim to predict the end-effector forces and torques from previous measurements of end-effector velocities and wrenches. However, instead of their 3D vector representations we use the

magnitudes of force, torque, and end-effector velocity (linear and angular separately) for learning and prediction. Since the magnitudes of frictional forces and torques are independent of the direction of motion (for objects having consistent friction properties) and depend only on the relative speed of motion of the objects, such simplified representation is sufficient to learn and predict the end-effector forces and torques along the direction of motion. This reduced representation of forces and torques makes the learning process simpler, more computationally efficient, and also independent of the direction of motion. The learned model always predicts the forces and torques along (or against) the direction of motion. Since the end-effector’s direction of motion is always known, the components of force and torques along the axes of motion can be recovered when needed. The final state space where the forward model is learned can therefore be represented as , with:

(2)
(3)

The predictions from the forward model would then provide the feed-forward term of the controller (Eq. 1) that cancels out the effect of the predicted wrenches during motion, providing the control equation:

(4a)
(4b)
(4c)

where is the weighted feed-forward wrench (end-effector forces and torques) predicted by the forward model associated with the present mode . The factor , a function of the accuracy of the forward model at instant , maps the error in prediction from the forward model () to a value between 0 and 1, e.g., logistic function in Equation 4c. The logistic growth rate and the Sigmoid midpoint are hyper-parameters tuned for the task. This control law formulation relies on the feed-forward term only if the forward model’s predictions are found to be accurate; if not, the feedback control term is prioritized. Equation 4b describes the adaptation of stiffness parameters as a function of the prediction accuracy. is the maximum allowed stiffness, and is the minimum stiffness for accurate position tracking in the absence of external disturbances (i.e., free space motion). The damping term is updated as considering the manipulator as a critically-damped systems. This way, the robot can follow accurately using high feedback gains if the forward model is unreliable, but can be compliant if they are compensated well by the feed-forward term.

To incrementally update the forward model of our framework using new measurements during task execution, we used an online variant of GMM called the Incremental GMM (IGMM) (Song and Wang, 2005; Engel and Heinen, 2010; Ahmad, 2006). IGMM can update model parameters and incorporate additional components to the mixture model using a measure of closeness and frequency which are defined by the user using hyperparameters. For more information regarding the incremental algorithm used, readers are referred to (Engel and Heinen, 2010)

. IGMM internally uses a variant of the Expectation-Maximization (EM) algorithm to fit the model and maximize the following likelihood function:

(5)

where are the parameters of the components of the GMM. represents the points to be fit, with . Each point contains information about the previous end-effector state, along with the current wrench. Once trained, the forward model provides a function:

(6)

that predicts at the next time step as a function of the current (measured) value of , using Gaussian Mixture Regression (GMR) (Sung, 2004). The incremental nature of our dynamics model allows the robot to capture the smoothly varying interaction dynamics between the robot end-effector and the environment within a contact mode.

3.2 Contact Mode Recognition and Identification

In a typical manipulation task, the interaction dynamics undergoes a non-smooth transition when a contact change occurs, moving the state of the hybrid piecewise continuous system to a new contact mode. Our approach for recognizing known modes and identifying new ones is based on the observation that any change in mode is accompanied by a sudden significant change in the sensor readings. In our framework, the robot responds to pronounced changes in force-torque measurements by briefly using a high-stiffness control strategy while quickly obtaining a batch of sensor data to confirm and respond to the transition. The robot learns a new dynamics model if a new mode is detected, and transitions to (and revises) an existing dynamics model if the transition is to a known mode.

The key factor influencing the reliability and generalizability is the choice of feature representation for the modes. This representation is task-dependent and should be able to uniquely identify the different contact modes in the task. For the task of sliding an object over surfaces with different values of friction (Figure 8), for instance, the property that strongly influences the end-effector forces () is the friction coefficient between the object and the surface. When two objects slide over each other at constant velocity, is proportional to the applied normal force () and the friction coefficient () (assuming the relative orientation of their surface normals do not change); can then be estimated as:

(7)

A concise feature representation for this task is thus , which has the effect of making mode classification independent of the magnitude of the applied force.

The management of modes is based on an online incremental clustering algorithm called Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) (Zhang et al., 1997) in the Scikit-learn library (Pedregosa et al., 2011). This algorithm incrementally and clusters incoming data for given memory and time constraints, without examining all data points or clusters. Each cluster is considered to represent a mode in an abstract feature space, with the clusters updated using batches of the feature data. The fraction of the input feature vectors assigned to any cluster determines the confidence in the corresponding mode being the current mode. If the highest such confidence value is above a threshold, the dynamics model of that mode is used and revised until a mode change occurs. If the feature vectors are not sufficiently similar to an existing cluster, a new cluster (i.e., mode) and the corresponding dynamics model are constructed and revised (Section 3.1) until a mode transition occurs.

Input : Desired motion pattern as sequence of task space way-points, Control parameters: ; Dynamics models corresponding to modes ; Current mode: .
1 while Motion pattern not complete do
2       if mode transition detected then
3             // Set high stiffness // Detect (new/existing) mode = detect_classify_mode() // Populate new model for new mode if new mode found then
4                  
5             end if
6            
7       end if
8      Update online and use it for control in the identified mode (Eq. 4)
9 end while
Algorithm 1 Control loop for hierarchical model learning of piecewise-continuous dynamics

Algorithm 1 is an overview of the framework’s control loop for a changing-contact manipulation task, e.g., sliding an object on a surface. It proceeds until a desired motion pattern is completed. The mode-detection module is triggered when there is a contact change is detected by the force-torque sensor (line 2). The robot detects mode changes when there are substantial changes in the sensor measurements. The robot responds by setting a high stiffness (line 3), collecting sensor measurements, determining the transition to a new or existing mode (line 4), and creating new models if necessary (lines 5-7). In the absence of a mode transition (e.g., the detected change in sensor measurement was an anomaly), the robot continues with the current mode and dynamics model (line 9).

3.3 Predicting contact changes

In a static environment, a manipulator robot may experience discontinuities in dynamics (contact changes) of two types: (i) impact transitions (collisions), and (ii) impact-less transitions.

Collisions occur when a moving robot (end-effector) comes in contact with a fixed object in its workspace. Given that a task plan is known in a fixed static environment, there are several methods of estimating the contact locations for such mode changes. Unless occluded or visually indistinguishable, visual sensors are prime candidates for providing rough initial estimates of where collisions can occur in a planned motion. An important characteristic of collision-based mode change is that it will always remove at least one degree of freedom (DoF) of the robot in a functional coordinate space. Collisions are marked by large spikes in force-torque measurements as a function of the relative velocity of approach as well as material properties such as coefficient of restitution and hardness of the objects involved. Furthermore, collisions also bring about a discontinuity in the velocity of the end-effector by forcing at least one dimension in the functional coordinates to go to zero. This causes spikes in acceleration, jerk etc. All these effects can be attributed to the sudden loss in a degree of freedom in the functional coordinate space.

The second type of contact change occurs when the interaction discontinuities occur due to discrete changes in the environment dynamics that are not due to impact. A simple example is when robot slides across two surfaces whose friction are different (Figure 0(a)); here, the robot experiences sudden changes in frictional resistance as it crosses the boundaries between the surfaces, but the transition does not necessarily cause large impact forces or drop in velocity. The main feature that distinguishes impact-less transitions from collisions is that these mode transitions occur without a loss in degree of motion freedom.

Anticipating contacts (collisions) by predicting impact forces or time to collision is challenging because these parameters are influenced by robot dynamics and controller parameters, e.g., reducing the velocity or stiffness reduces the impact force and increases time to contact. Static contact parameters such as end-effector position during impact and direction of contact force can be predicted more reliably; they do not change significantly for task repetitions if we can make the reasonable assumption that the trajectory and environmental attributes do not change significantly between repetitions.

In our contact anticipation model, the robot’s belief about the position of each expected contact location in the assigned motion trajectory is modeled as a multivariate Gaussian, with the covariance ellipsoid denoting the uncertainty and the “region of anticipated mode transition” . In the context of this work, a ‘contact position/location’ refer to a position in the task-space of the robot where a mode switch occurs when following the provided task plan. Therefore, in any trial of the task, the robot expects a specific contact position to lie within the corresponding . Since the controller is in the task space, each contact location’s representation is compact and is updated over very few trials of the task using a Kalman filter with the state update equation: , where is the contact position, is the object’s self-activation ( for positively activated objects), is the control matrix capturing the effect of action u on contact position, and is Gaussian noise modeling the uncertainty in the contact location. The sensor model uses the end-effector pose (obtained by forward kinematics with joint positions) as measurement when a contact is detected; noise in the sensor model depends on the joint encoder noise and forward kinematics. The corrected estimate of the contact point results in a reduced covariance ellipsoid for subsequent trials. Although this representation supports contact with movable objects, we assume in this paper that the end-effector only makes contact with stationary objects. These simplifications result in Gaussian updates using the noisy measurements based on the robot’s kinematics model (since sensor input is from an FT sensor) each time the robot experiences a contact change.

3.4 Transition-phase controller for contact changes

Next, we describe the transition controller for safe and smooth dynamics during mode transitions. The transition controller follows the same control structure as our default variable impedance controller (Eq. 4a) and varies only in the choice of the control parameters. Depending on the type of mode transition (collisions or impact-less), the choice of the control parameter values can vary, but the basic structure remains the same:

(8)

where and are the stiffness and damping parameters of the transition-phase controller that can vary depending on the type of mode transition. The different desired properties of the transition-phase controller for the two types of mode transitions are described in the next sections.

3.4.1 Transition-phase controller for collisions:

Since the permitted impact force may differ based on the task, e.g., large forces can damage delicate objects, we imposed a limit on the maximum allowed impact force. Also, we figured out experimentally that reducing the controller stiffness helps reduce the jerk in motion after impact by providing compliance, but has no significant effect on impact forces because the error and stiffness term in the feedback control loop come into effect only after contact is made. A safe controller should thus have lower stiffness for reducing vibrations. In addition, the approach velocity was observed to be directly proportional to the impact force, especially when the robot registered a contact while moving in free space. We exploited these insights in our framework and used a simple linear (regression) model to capture the relationship between impact force and approach velocity between a pair of objects. This model was then used to compute the approach velocity for any (given) desired force on impact.

Since the robot may not initially have a model of the (linear) relationship discussed above, it starts with a safe low velocity during the first trial of any given task and a target force on impact. It then uses the difference between the target and measured force on impact to revise the approach velocity for the next iteration of the task:

(9)

where is the change in approach velocity, is the desired impact force along motion direction, is the measured impact force, and is a learning rate that is ideally less than or equal to the slope of the function relating impact force to approach velocity. Over time, this method enables the robot to learn a task-specific approach velocity for a desired impact force.

3.4.2 Transition-phase controller for impact-less mode transitions:

If the robot expects an impact-less transition in , it does not have to reduce its velocity and can continue following the original plan. Reducing velocity would bring delays in the task which is undesirable. However, having a high-stiffness controller would help in identifying new modes after transition. This is because high stiffness ensures that the robot motion is steady and the sensor measurements are more accurate, which helps in mode identification. For this reason, the ideal transition-phase controller for impact-less transitions should switch to a high-stiffness controller. The parameter values for this transition phase controller (Eq. 8) in our framework is set to use the maximum allowed values of the active adaptive variable impedance controller (base controller), i.e., .

3.5 Smooth Transition between Controllers

The robot will have to switch to a transition-phase controller smoothly as it enters the region of anticipated mode transition . To avoid discontinuities in motion dynamics, the robot needs to smoothly transition from a normal (base) controller with output to the appropriate transition-phase controller with output . Linear interpolation of and over a time window is possible since they are of the same task-space representation, and can produce smooth transition between these controllers:

(10)

where is the desired duration of the transition between the controllers. As long as the outputs from the two controllers ( and ) are individually smooth, the output of the combination will also be smooth. In this work, controllers use the task-space representation described earlier, with being the output of the fixed transition-phase controller (low- or high-gain depending on the transition type) as the arm approaches the contact point. We use this strategy to smoothly transition between controllers of different stiffnesses such that the transition is completed by the time the robot reaches . A similar approach is used to smoothly transition from the transition-phase controller to a normal controller after contact is made.

3.5.1 Velocity Profile Shaping:

Recall that transition-phase controllers for handling collisions use a lower velocity than that used in the original kinematic sequence to reduce the force on impact. Also, the switch to this controller will take place at different points in the trajectory as the region is revised over time. The trajectory’s timeline thus has to be modified to account for the modified velocity profile. To achieve this objective, we enable the robot to create a new velocity profile and time-mapping. Our formulation results in motion that is smooth and continuous at all orders, i.e., is smooth.

Without loss of generality, assume that is along one dimension with velocity . Assuming that transition starts at time with and has to be completed at with velocity as the robot crosses boundary point of , our velocity profile is defined as:

(11)

where . For , has continuous derivatives at all orders at every point on the real line. Since has a strictly positive denominator for all points in its domain and velocity limits are enforced , this profile provides a smooth transition from to over and is continuous despite its piece-wise definition. Acceleration and jerk are computed as first- and second-order derivatives of with respect to , and position trajectory is obtained by integrating the profile; all motion derivatives are continuous – see Figure 3.

Figure 3: Velocity plots with matched position, acceleration, and jerk plots. Velocity varies from 1.2 to 0.5 in unit time.

4 Experimental evaluation

In this section, we experimentally demonstrate our framework’s capabilities. We first evaluate the effectiveness of the overall framework in producing smooth overall motion in the presence of collisions and impact-less mode transitions in Section 4.5. Then we proceed to evaluating the different components of the framework in performing their separate functions by comparing with baselines from literature wherever possible. For this, we demonstrate first the need for having an incrementally updating model for control in continuously and smoothly varying environments (Section 4.1). We then show the advantage of our adaptive variable impedance controller in such environments (continuous contact) by comparing it with a few other adaptive control strategies (Section 4.2). We then demonstrate experimentally the ability of the hierarchical model learning framework to identify and model different contact modes in tasks where the interaction dynamics changes discretely (Section 4.3). Section 4.4 experimentally justifies the choices made in the formulation our hybrid framework for handling piece-wise continuous dynamics, by comparing performance with a sophisticated, long-term prediction algorithm as the baseline in a 3D simulation domain involving a 7-DoF robot arm.

Our framework is tested in the context of a physical 7 DoF robot performing a manipulation tasks that involves multiple contact changes (collisions and/or impact-less mode transitions). Comparisons with baseline methods are set up in simulation domains where the interaction dynamics can be controlled and repeated. In most experiments, the performance measure is the accuracy or error in achieving the desired trajectory along with qualitative measures such as smoothness of interaction dynamics and delays in task completion.

The primary hypotheses being tested in this section are:

  • Incrementally updating adaptive models are required to capture smoothly changing and/or unmodelled interaction dynamics in a contact mode. This helps our adaptive variable impedance controller perform better than traditional adaptive control strategies in such environments;

  • Our hybrid hierarchical model learning strategy for piecewise-continuous dynamics detects and models contact modes in a reliable and efficient fashion;

  • Corrective system feedback in the task-space and online adaptation of hybrid models (as in our hierarchical model learning) is required for effectively capturing piecewise-continuous interaction dynamics in changing-contact manipulation.

  • The contact-change-handling module of our framework reliably predicts contact changes and provides smooth motion during mode transitions;

4.1 Incremental models for continuously changing dynamics

Our prior work on smooth control for continuous dynamics has been evaluated on a seven degree of freedom (DoF) robot arm performing continuous contact tasks (Mathew et al., 2019). To demonstrate the importance of incrementally adapting the forward model for continuous changes in the environment (H1), we compare our incrementally-trained forward model with a fixed GMM model pre-trained using the same data. We used a custom-built 2D simulated environment (Figure 0(c)) based on Box2D physics engine (Catto, 2017) with noiseless measurements.

(a) Predictions from fixed GMM forward model.
(b) Predictions from incremental GMM model.
Figure 4: Comparison of fixed and incremental GMM models in “porridge” environment; red: predicted values; green: actual measurements.

We simulated a porridge-stirring task in simulation with the viscosity continuously increasing as the robot moves. The robot was made to move along a circular path in an environment where viscosity increases continuously from to in steps of every simulation step (the total circular trajectory had 600 steps). The measurements from this experiment were used to build a fixed GMM model for one-step prediction of end-effector forces from previous end-effector velocity and force. This model’s predictions in an environment where the viscosity increases from is shown in Figure 3(a). We observe that the model is able to predict the forces accurately when the measurements are similar to the training data, but fails when the measured values are very different from the training data. Even when the predictions are accurate, it is largely due to the use of a constant high-stiffness controller. Although we do not show it here, when this learned model is used to guide the variable impedance behavior (in Eq. 4b), it causes unreliable trajectory tracking. Also, for the pre-trained model to be useful, the training data will need to include samples from different regions of the state space; this imposes additional training time and memory requirements. These results indicate the need for using an incrementally learned forward model to account for continuously-changing interaction dynamics, and proves hypothesis H1.

4.2 Comparison with adaptive control methods

To further test H1, we compared the capabilities of the adaptive variable impedance controller (base controller) of our framework with three adaptive control methods in the presence of continuously and smoothly varying interaction dynamics.

4.2.1 Comparison with MRAC:

We used an established MRAC method implementation as the baseline (Tarokh, 1991). Recall that MRAC methods attempt to adapt their control law based on a pre-defined reference model. The reference model in the chosen MRAC method uses a second-order spring-damper system for each dimension , parameterized with user-defined values for desired natural frequency and damping ratio . The control law continuously adapts its control parameters to minimize tracking errors.

Figure 5: Performance of MRAC is highly dependent on the design of the reference model.
Figure 6: Trajectory tracking comparison between AIC and our base controller.

MRAC was tested in a “multi-spring” environment where the robot had to move along a circular trajectory while the end-effector is being pulled by three springs of different stiffness (Figure 0(c)). Tuning the large number of hyperparameters was found to be quite cumbersome for continuously varying environments, e.g., in Figure 5, trajectory tracking is affected by the choice of the frequency parameter of the reference model, with stability affected adversely when value of is changed from 1.5 to 1. However, hyper-parameter selection for the “porridge” environment proved difficult and performance comparable to our framework could not be achieved using MRAC.

4.2.2 Comparison with a self-tuning regulator:

As an example of a self-tuning controller, we chose the Active Inference Control (AIC) method (Pezzato et al., 2020). This adaptive control strategy supports continuous updates for changing environmental dynamics; the control parameters are iteratively updated to account for unaccounted energy in order to reach a fixed target. Given time, AIC converges to a fixed target smoothly and without any appearance of instability for any dynamically changing environment. However, this is not useful when the robot has to follow a motion trajectory as the controller’s parameters will typically not converge before the target changes to the next point in the trajectory. As a result, the robot always lags behind the target with AIC (see Figure 4.2.1), whereas our adaptive variable impedance controller compensates for external disturbances by directly predicting and canceling the forces, thus minimizing the lag in trajectory tracking.

4.2.3 Comparison with a gain-scheduling controller:

We implemented Biomimetic Adaptive Control (BAC), an RL-based approach that iteratively updates control parameters (gains) at each point in a repeating trajectory (Yang et al., 2011). Given data from multiple trials of such a trajectory, parameters at each time step are updated using values from previous iterations at the same time step to minimize a cost function based on tracking error.

(a) BAC is unstable with incorrect initialization of controller parameters.
(b) BAC parameters do not converge in 30 trials in “multi-spring” environment.
Figure 7: Trajectory followed using BAC by robot in trial 30 in two different experiments. Dotted line indicates target trajectory and solid line is the actual path followed by robot.

In our experimental trials, we observed that bad parameter initialization causes instabilities and tracking irregularities that accumulate as the trajectory progresses, since BAC does not account for the temporal relationship between controller parameters within a trial. One example of such a trial with incorrect parameter initialization is shown in Figure 6(a). Due to this sensitivity to initialization, it is difficult to ensure smooth trajectory tracking when learning a time-indexed parameter profile.

Although BAC can provide good trajectory tracking for simple environments when good parameter initialization and enough trials are provided (Yang et al., 2011), the parameters often do not converge in a limited number of trials with more complex environments such as the “multi-spring” environment in Figure 6(b). Such methods also rely on the duration of the task (i.e., number of time steps in each trial) to be fixed across trials.

The results from these comparison experiments show that the incremental nature of the feed-forward model and the online adaptation of the feedback gains help our controller perform better than traditional adaptive control strategies in the presence of continuous and smoothly changing environments, supporting our hypothesis H1.

4.3 Evaluating hybrid framework for piecewise continuous systems

In this section, we experimentally test the ability of our hybrid model learning framework to detect new contact modes and learn and/reuse them for variable impedance control within that mode. We demonstrated the need for having a hybrid model for learning the dynamics of piecewise-continuous systems in our previous work (Sidhik et al., 2020), where we showed how using a single model is insufficient for accurate trajectory tracking in the presence of such dynamics. Here, we evaluate the ability of the mode-detection component of our framework in identifying known and unknown contact modes and using appropriate forward models for variable impedance control in that mode (H2).

Figure 8: Manipulator sliding an object in a pattern along three surfaces with different friction.

The robot was asked to slide an object along a desired trajectory, and it experienced three previously unseen surfaces with different values of friction (see Figure 8). We expected the robot to identify a transition to each new mode (i.e., each surface) and incrementally build a dynamics model for the mode while operating under high stiffness. Once the dynamics models for a mode had been built, we expected the robot to respond to subsequent transitions to this mode by using the corresponding dynamics model. The feature representation we used for clustering to distinguish between the contact modes was as mentioned in Section 3.2, which helps to differentiate between the surfaces by estimating distributions for their friction coefficients.

Figure 9: Modes detected and their confidence values; red vertical lines on the x-axis indicate actual mode transitions. The number on top of a peak (in green) indicates the confidence with which the transition was identified; the number below a peak (in red) corresponds to the mode with the next highest confidence. “N” indicates a transition to a new mode.
Figure 10: Performance of the hybrid mode learning and controller for changing-surface task. Top: controller stiffness. Bottom: absolute error in trajectory tracking. The spikes during trajectory tracking correspond to a temporary, incorrect feed-forward prediction by the previous model after the guard regions.

Figure 9 shows the robot’s ability to detect mode changes in one trial of this experiment. The robot was able to identify transitions to existing or new modes with high confidence. In each instance, the second best choice of mode was associated with a much lower value of confidence. The results also show that our hybrid framework and feature representation make performance robust to changes in the direction of motion, i.e., a new mode is not identified when the manipulator moves over a previously seen surface in a new direction. There was some similarity in the confidence values for surfaces 2 and 3 (S2 and S3 in the plot) because of the similarity in their friction values.

Figure 10 shows the trajectory tracking error and the values of the stiffness parameters of the controller during the trial. The peaks in the trajectory error plot correspond to a sudden change of surface. During each such instance, the predictions made by the dynamics model of the previous mode caused a momentary error in the trajectory tracking ability, until the robot switched to the high-stiffness mode and identified the current mode; the robot then used suitable low(er) stiffness to complete the task. It can also be seen that switching to a previously seen mode requires a much shorter period of high stiffness compared with building a new dynamics model. These results show that the framework is able to identify and learn the interaction dynamics for different contact modes, supporting (H2). In our previous work, we also demonstrated the ability of the framework to be able to identify and learn the contact modes for the robot performing sliding tasks using different types of contacts (e.g. edge contact, surface contact, etc.) while being robust to changes in direction of motion and applied normal force (Sidhik et al., 2020).

4.4 Need for online learning for hybrid systems

To explore the need for incremental hybrid models for systems with piece-wise continuous dynamics (H3), we compared our hybrid model learning strategy with a baseline framework that performs offline long-term prediction of dynamics (Khader et al., 2020)

. It identifies different dynamic modes in the task from a training dataset of desired motion, uses multi-class Support Vector Machines to build guard function to predict mode changes, builds separate Gaussian Process (GP) dynamics model for each mode, and provides a probabilistic algorithm for multi-step prediction of joint-space state variables for the changing-contact manipulation task. This baseline framework learns a hybrid model from a training set of repeated task demonstrations. It was proposed as a data-efficient way of predicting long-term state evolution of joint-space dynamics for tasks involving discontinuous dynamics; it was

not designed as a solution for real-time control in such tasks. However, it does present an interesting solution for the core problem of control of tasks with discontinuous dynamics. We use it as a baseline to show the need for real-time control and learning in changing-contact robot manipulation.

We used the PyBullet simulation environment (Coumans and Bai, 2016) to setup a task similar to the one used with the baseline framework. As shown in Figure 0(b), the robot (with a block fixed to the end-effector) approaches a table to make contact with it, slides along the table until it collides with a wall, and slides on the table along the wall. For evaluating H3, we wanted to test the following claims: (i) Unlike our method for learning piecewise-continuous dynamics, the baseline approach is unreliable for tasks with discontinuous dynamics unless the robot uses high-stiffness control or is trained and tested in the same environment; (ii) real-time update of predictive (dynamics) models from our approach provides more reliable performance than the baseline’s offline, long-term prediction of dynamics in the presence of unexpected environmental changes.

4.4.1 Long-term prediction of joint dynamics:

(a) System trained with wall friction 0.6, tested with wall friction 0.9.
(b) System trained with wall friction 0.1-0.7, tested on wall friction 0.9.
Figure 11: Long term prediction of one joint’s position using baseline framework. Dotted vertical line indicates robot’s contact with wall obstacle.

We first trained the baseline system with a dataset of trials in which the robot used a fixed medium-stiffness controller and the wall (second collision, red block in Figure 0(b)) friction was fixed to . Testing was then done with an unseen higher friction value (0.9) and the same controller. Figure 10(a) shows the prediction and actual values of the positions of one of the relevant joints in the task (first joint). Being unaware of the change in friction, the offline framework predicted the joints to move as freely as it did during training. The true joint positions were, however, affected by the higher friction resulting in the “true” values lagging behind the “predicted” values.

For further exploration, the training set was modified to include wall friction values between at increments of ; the system was then tested on a wall with friction 0.9. This results in a prediction that is similar to the previous set of experiments, but with a wider band of uncertainty around the prediction due to the larger variability in the training data (Figure 10(b)). In addition, when a high stiffness controller is used for training and testing, the robot follows the trajectory quite closely regardless of friction. This results in lower variability across trials (marked by the narrower uncertainty band around the prediction), and more accurate predictions (Figure 12). These results indicate that learning dynamics in the space of joint positions and velocities poses a high-dimensional problem; the learned models do not capture the interaction dynamics accurately, and predictions are reliable only when used with a high stiffness controller.

Figure 12: Long term prediction of the position of one joint produced by the baseline framework. System trained and tested using a constant high-stiffness controller (friction values as in Figure 10(b)).
Figure 13: Long term prediction of end-effector force along Y-axis (along the wall). System trained with wall friction values between , tested on wall with friction 0.9.

The main reason for the baseline framework’s poorer performance in the presence of unseen environments is the fact that the system is unaware of the environment’s dynamics before it performs the prediction. This is not a flaw of the framework as it was not designed for such use cases. However, the experiments show that for accurate long-term prediction of joint-space dynamics, either the environment has to remain unchanged and/or the robot has to use a high-stiffness controller. For environments whose dynamics can change in previously unseen ways, the system will at least have to be trained with many examples of different environments. Our framework addresses these issues by learning the the dynamics in the end-effector space (i.e., task space instead of joint space) and has access to a more direct measurement of the interaction dynamics in the form of end-effector forces and torques.

4.4.2 Long-term prediction in the task-space:

Next, the baseline framework was modified to consider interaction dynamics in the task space by reformulating the framework’s GPs (that model each mode’s dynamics) to model , similar to the forward model of our base control framework (Section 3.1).

The system was again trained with examples of the state evolution when the surface friction was between 0.1 and 0.7, and then was compared with the measurements taken when the surface friction was 0.9. Figure 13 shows the measured and predicted force measurements along the Y-axis (the direction along which the wall’s friction acts). The spikes in Figure 13 at along the X-axis corresponded to when the robot made contact with the table, and the spikes between corresponded to when the robot collided with the wall. We observed that the actual values of friction force after contact with the wall did not fall within the predicted band of uncertainty. Also, the effect of different levels of friction was more observable in the task space, indicating that it is more meaningful to represent the interaction dynamics for changing-contact tasks in the task space. This experiment suggests that interaction dynamics cannot be predicted reliably without some form of real-time feedback.

(a) Modified baseline framework’s prediction.
(b) Predictions from the forward model of our base controller.
Figure 14: Prediction of end-effector force along Y-axis (along the wall). Each system was trained with wall friction values between , and tested on a wall with friction 0.9. Dotted vertical line indicates when the robot made contact with the wall (obstacle).

4.4.3 One step prediction in task-space using real-time feedback:

In the next experiment, real-time measurements from the robot () were provided as feedback to the baseline framework during testing. At each timestep, the system then only had to predict sensor values for the next timestep using the dynamics model for the identified mode. Although this strategy produced better results (see Figure 13(a)) by extrapolating from the learned model to handle previously unseen values, the predictions are still not accurate. This is because the baseline framework’s model is not incrementally updated to account for the new environment dynamics. Our hybrid framework, on the other hand, supports incremental, real-time updates to the dynamics models. This resulted in more reliable predictions for the same experiment (see Figure 13(b)) and thus more accurate trajectory tracking in the presence of discretely changing dynamics and previously unseen environments.

Note that the training of the baseline framework required significant time depending on the density and the length of the data in the training set. The implementation of the GP in this framework used minutes training time on an 8-core, 16GB RAM computer without using a GPU, for a dataset consisting of 40 trials of the task mentioned above; the long-term prediction process during testing takes minutes.

Overall, the experiments discussed above demonstrate that the incremental, real-time revision of the predictive (dynamics) models is critical for changing-contact manipulation tasks, particularly when the learning is done in the task space and suitable feedback of the system state is available (and used). The results of these experiment support hypothesis H3. The observed results are also influenced by the choice of the state space and feature representation used for control and learning. A hybrid control framework that supports such control and learning is better able to adapt to previously unseen variations in the dynamics and provide smooth control of changing-contact robot manipulation tasks.

4.5 Evaluating framework capability in producing smooth overall motion

In this section, we test the ability of the overall framework (AVIC + hybrid model learning + contact-changing-handling) to anticipate mode transitions and produce smooth overall motion by switching to appropriate transition phase controllers in the anticipated regions (H4). We demonstrated the ability of the contact-change-handling module in incrementally improving the estimates of contact location and learning appropriate approach velocities for impacts in our previous work Sidhik et al. (2021).

Figure 15: A sliding task that involves making contacts with the table’s surface at ”1”, with a wall at ”2”, and with another object at ”3”.

4.5.1 Framework in the presence of multiple collisions:

To evaluate the overall framework and the resulting dynamics on a physical robot performing tasks involving multiple collisions, the robot (with a wooden block attached to end-effector) was asked to move vertically down to the table (contact 1), slide along y-axis (the table’s surface) to a wall (contact 2), and slide along the wall (while in contact with the table’s surface) to another obstacle (contact 3), as shown in Figure 15. The robot was provided significantly incorrect initial guesses of the contact positions with substantial noise (see Table 1). The robot had to repeat the task while reducing the deviation from the given motion pattern by improving its estimate of the contact positions. The robot also had to modify its approach velocity from the initial value of to produce a desired impact force of N. Since each contact in the task is in the presence of different environment dynamics (e.g., motion in free space, motion against surface friction), the velocity required to attain the desired impact force was expected to be different. The robot also had to incrementally update its approach velocity for each contact using gradient descent till the desired velocity for that environment was achieved. Furthermore, the robot had to perform all the trials with smooth overall motion dynamics with minimum spikes in the velocity or acceleration profiles.

(a) Experiment trial 1.
(b) Experiment trial 5.
Figure 16:

Velocity, acceleration, force, and controller activation levels in: (a) experimental trial 1; (b) experimental trial 5. Use of our framework reduces uncertainty in estimates of contact positions, reduces the time spent using the transition-phase controller, and reduces discontinuities.

Figure 15(a) shows the velocity, acceleration, and EE force in the first trial, and Figure 15(b) shows these values after five trials. The results in these figures and in Table 1 show that the uncertainty in the estimate of the contact positions is reduced, as indicated by a significant reduction in the size of the covariance ellipsoids, and the robot spends significantly less time using the transition-phase controller and the associated lower velocity. The last plot in Figure 15(a) and Figure 15(b) show the activation of the default controller and the transition-phase controller. The overall task was completed in in the fifth trial as opposed to in the first trial. These results support H4.

Prediction Error (m) Initial Final (trial 5)
Contact 1 (Z-axis) 0.12 0.3 0.016 0.039
Contact 2 (Y-axis) 0.09 0.2 0.011 0.04
Contact 3 (X-axis) 0.1 0.2 0.018 0.036
Table 1: Error in the estimated contact location along the most significant axis for the contact (in parenthesis) in the first and fifth trials of the task in Figure 15

. The value along the diagonal of the corresponding covariance matrix is shown as the standard deviation (

term).

The covariance ellipsoids converged in the first three trials of the task, but the task was repeated to evaluate the ability to compute and set the approach velocity for different transition-phase controllers. The robot converged to a suitable approach velocity for the first contact (from motion in free space) in five iterations. It was, however, difficult for the robot to adjust its approach velocities for contacts 2 and 3, which required the robot to use force control along one and two directions (respectively). Contact 3 was particularly challenging because it involved sliding along two different surfaces, resulting in very noisy readings from the force-torque sensor due to the different values of frictional resistance offered by the two surfaces. Since the impact force was along the same direction as friction, it was more difficult to isolate the impact force from the force due to surface friction, which made updating the approach velocity more challenging.

4.5.2 Framework effectiveness for impact-less transitions:

Figure 17: Selected sections of the velocity (top), acceleration (middle), and force (bottom) curves in the first trial of the task; the robot is unaware of the impact-less transitions in the task beforehand. Left: Curves for the first impact-less transition (motion ‘2’ in Figure 0(a)); Right: Curves for the second impact-less transition (motion ‘4’ in Figure 0(a))

In this experiment, the objective was to test the effectiveness of the framework in learning to predict contact changes that are not due to collisions, but due to impact-less transitions.

The experiment involved the robot performing a similar changing-contact experiment as before, but this time, the surface along which the robot has to slide suddenly changes without the knowledge of the robot (see Figure 0(a)). The location of the surface switches are not known beforehand, and the robot has to learn to anticipate them in subsequent trials once it realizes that there are previously unknown contact changes in the provided plan. The task involved the robot approaching a surface (surface A) from top, and sliding along the surface to make contact with a wall as before; however, the surface changes midway to one with higher friction (surface B) unknown to the robot. The robot then has to slide along the wall, and then move to another wall parallel to it, on the way to which the robot will slide surface B to A again. As in the previous experiment, the robot is provided with initial guesses of where collisions occur in the task, but it is not provided any knowledge about the changes in surfaces; this, the robot has to detect in its first trial and should then be able to predict and handle in subsequent trials.

The desired behavior from the robot is that it learns to anticipate collisions as well as impact-less contact changes, and smoothly switches to an appropriate controller, that reduces the discontinuities in the motion dynamics during task execution.

(a) Experiment trial 2.
(b) Experiment trial 3.
Figure 18: Selected sections of the velocity (top), acceleration (middle), and force (bottom) curves in trials 2 and 3. Left: Curves for the first impact-less transition (motion ‘2’ in Figure 0(a)); Right: Curves for the second impact-less transition (motion ‘4’ in Figure 0(a))

In this experiment, we focus on the performance of the framework in the regions where the surfaces change, i.e., where there are impact-less mode transitions. This occurs at two sections in a trial of the task: Figure 17 shows the velocity, acceleration, and end-effector force profiles in these regions measured during the first trial of the task. As expected, the robot experiences sudden spikes in velocity and acceleration (jerk) when the surface changes unexpectedly. Recall from previous chapter that this is due to the wrong predictions provided by the dynamics model which provides a feed-forward term for the controller that either overestimates or underestimates the environment forces in the new mode. When the robot detects such a discontinuity, it identifies this as a contact change and quickly switches to a high-stiffness controller to identify the new mode it is in. Seeing that this is a new surface which it hasn’t seen before, the robot learns a new dynamics model for the mode which it uses as a forward model for the base controller in that mode. The surface of the table changes twice during the task; however, for the second surface change, the robot recognises the new dynamics mode as a surface that it has seen before (surface A).

In the next trial (Figure 17(a)), the robot expects these mode switches (the anticipated region is marked in the figure), and because it anticipates an impact-less contact change, the robot switches to a high stiffness controller which would help the robot to identify the new mode quickly while also reducing sudden velocity-acceleration spikes during motion. As desired, the size of the anticipated region reduces in subsequent trials (Figure 17(b)). These results further support hypothesis H4.

An interesting phenomenon that was observed during the experiment was that the robot using this high-stiffness controller is able to easily detect that a contact change has occurred when it slides from a surface of lower friction (A) to that of higher friction (B); however, it does not always detect that a mode change has occurred when sliding from B to A. This was understood to be because of the fact that when sliding a block from a surface of low friction to that of high friction, the sudden increase in friction at the region of the block in contact with the new surface offers the highest resistance, and contributes to the frictional resistance of the block. This change is sudden and pronounced (as seen in figure). On the other hand, when sliding from a high-friction surface to a low-friction surface, the trailing part of the block is in contact with the high friction surface which still offers resistance. This causes the robot to feel that the frictional resistance is being reduced gradually and not suddenly as in the other case. This lack of discontinuity in the sensed forces, and the absence of spikes in velocity due to the high-stiffness controller motivated the need to add another component to the contact-change-detection module of the framework: a threshold for the error between the measured end-effector force and the prediction from a (non-updating) forward model for that mode.

5 Conclusion

Many robot manipulation (and human manipulation) tasks are changing-contact manipulation tasks. They are characterized by piece-wise continuous interaction dynamics, with discontinuities due to changes is contacts, surfaces, and other factors, and continuous elsewhere. While it is possible to construct a hybrid framework with continuous dynamics within each of a set of discrete modes, it is difficult to provide comprehensive information about the modes in practical tasks.

In this paper, we presented an adaptive, hybrid framework for changing-contact manipulation tasks inspired by studies in human motor control. The framework has a high-level mode detection module which can recognize existing modes and identify new contact modes. Each such mode is characterized by a suitable predictive/forward dynamics model, control law, and relevance condition. The forward model is learned and revised incrementally during run-time, with the error between the predictive and actual measurements (of end-effector forces and torques) is used to adapt the gain parameters in the control law. Our framework also includes a contact handling module that uses a Kalman filter to incrementally improve estimates of contact positions. These contact position estimates help revise the velocity profile to minimize the time spent in a transition-phase controller, ensure smooth transition to and from this controller, and achieve a desired force on impact. Our representational choices enable us to simplify and address the associated challenges reliably and efficiently.

We showed the effectiveness of the overall framework in the context of a physical robot performing manipulation tasks that involve multiple contact changes. We have experimentally demonstrated the need for adaptive control and learning strategy in the presence of tasks and environments with a mix of continuous and discontinuous dynamics. We compared our framework’s performance with a representative method from each of three different classes of adaptive control methods in the existing literature, and with a sophisticated framework for offline, long-term prediction in the context of discontinuous dynamics, to highlight the capabilities of our framework. We did so in simulated as well as physical systems where a robot is executing motion patterns that involve making and breaking contacts on a tabletop. Results from our experiments support our claim that a hybrid framework that supports adaptive control and learning enables smooth control of changing-contact robot manipulation tasks.

References

  • F. J. Abu-Dakka, L. Rozo, and D. G. Caldwell (2018) Force-based variable impedance learning for robotic manipulation. Robotics and Autonomous Systems 109, pp. 156–167. Cited by: §2.2.
  • W. Ahmad (2006) Incremental learning of gaussian mixture models. External Links: Link Cited by: §3.1.
  • K. Ahn, W. K. Chung, and Y. Yourn (2004) Arbitrary states polynomial-like trajectory (aspot) generation. In 30th Annual Conference of IEEE Industrial Electronics Society, 2004. IECON 2004, Vol. 1, pp. 123–128. Cited by: §2.5.
  • A. Ajay, M. Bauza, J. Wu, N. Fazeli, J. B. Tenenbaum, A. Rodriguez, and L. P. Kaelbling (2019) Combining physical simulators and object-based networks for control. In 2019 International Conference on Robotics and Automation (ICRA), pp. 3217–3223. Cited by: §2.1, §2.1.
  • M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, et al. (2018) Learning dexterous in-hand manipulation. arXiv preprint arXiv:1808.00177. Cited by: §2.1.
  • P. R. Barragän, L. P. Kaelbling, and T. Lozano-Pérez (2014) Interactive bayesian identification of kinematic mechanisms. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 2013–2020. Cited by: §2.4.
  • M. Beetz, F. Stulp, P. Esden-Tempski, A. Fedrizzi, U. Klank, I. Kresse, A. Maldonado, and F. Ruiz (2010) Generality and legibility in mobile manipulation. Autonomous Robots 28 (1), pp. 21. Cited by: §2.1.
  • L. Biagiotti and C. Melchiorri (2008) Trajectory planning for automatic machines and robots. Springer Science & Business Media. Cited by: §2.5.
  • B. Brogliato (2019) Nonsmooth mechanics. models, dynamics and control: erratum/addendum. Ph.D. Thesis, INRIA Grenoble-Rhone-Alpes. Cited by: §2.4.
  • L. Buşoniu, T. de Bruin, D. Tolić, J. Kober, and I. Palunko (2018) Reinforcement learning for control: performance, stability, and deep approximators. Annual Reviews in Control 46, pp. 8–28. Cited by: §2.1.
  • S. Calinon, I. Sardellitti, and D. G. Caldwell (2010) Learning-based control strategy for safe human-robot interaction exploiting task and robot redundancies. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 249–254. Cited by: §2.2.
  • E. Catto (2017) Box2D. Note: http://box2d.org External Links: Link Cited by: §4.1.
  • A. Chatterjee (1997) Rigid body collisions: some general considerations, new collision laws, and some experimental data. Cornell University. Cited by: §2.4.
  • A. Chatterjee (1999) On the realism of complementarity conditions in rigid body collisions. Nonlinear Dynamics 20 (2), pp. 159–168. Cited by: §2.1.
  • E. Coumans and Y. Bai (2016) PyBullet, a python module for physics simulation for games, robotics and machine learning. Note: http://pybullet.org Cited by: §4.4.
  • B. De Schutter, W. Heemels, J. Lunze, C. Prieur, et al. (2009) Survey of modeling, analysis, and control of hybrid systems. Handbook of Hybrid Systems Control–Theory, Tools, Applications, pp. 31–55. Cited by: §2.3.
  • P. Engel and M. Heinen (2010) Incremental learning of multivariate gaussian mixture models. In BSAI, External Links: https://link.springer.com/chapter/10.1007/978-3-642-16138-4_9 Cited by: §3.1.
  • Y. Fan, W. Gao, W. Chen, and M. Tomizuka (2017) Real-time finger gaits planning for dexterous manipulation. IFAC-2017 50 (1), pp. 12765–12772. Cited by: §2.1.
  • N. Fazeli, S. Zapolsky, E. Drumwright, and A. Rodriguez (2020) Fundamental limitations in performance and interpretability of common planar rigid-body contact models. In Robotics Research, pp. 555–571. Cited by: §2.1, §2.4.
  • J. R. Flanagan, P. Vetter, R. S. Johansson, and D. M. Wolpert (2003) Prediction precedes control in motor learning. Current Biology 13 (2), pp. 146–150. Cited by: §2.1.
  • P. Freeman (2012) Minimum jerk trajectory planning for trajectory constrained redundant robots. Cited by: §2.5.
  • A. Gams, B. Nemec, A. J. Ijspeert, and A. Ude (2014) Coupling movement primitives: interaction with the environment and bimanual tasks. IEEE Transactions on Robotics 30 (4), pp. 816–830. Cited by: §2.2.
  • M. Gandhi, Y. Pan, and E. Theodorou (2017) Pseudospectral model predictive control under partially learned dynamics. arXiv preprint arXiv:1702.04800. Cited by: §2.1.
  • R. Grassmann, L. Johannsmeier, and S. Haddadin (2018) Smooth point-to-point trajectory planning in (3) with self-collision and joint constraints avoidance. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1–9. Cited by: §2.5.
  • M. Halm and M. Posa (2021) Set-valued rigid body dynamics for simultaneous frictional impact. arXiv preprint arXiv:2103.15714. Cited by: §2.4.
  • K. Hausman, J. T. Springenberg, Z. Wang, N. Heess, and M. Riedmiller (2018) Learning an embedding space for transferable robot skills. In International Conference on Learning Representations, Cited by: §2.1.
  • B. Huang, M. Li, R. L. De Souza, J. J. Bryson, and A. Billard (2016) A modular approach to learning manipulation strategies from human demonstration. Autonomous Robots 40 (5), pp. 903–927. Cited by: §2.1.
  • J. M. Hyde, M. R. Tremblay, and M. R. Cutkosky (1997) An object-oriented framework for event-driven dextrous manipulation. In Experimental Robotics IV, pp. 51–61. Cited by: §2.5.
  • M. K. Johansson (2003) Piecewise linear control systems: a computational approach. Vol. 284, Springer. Cited by: §2.3.
  • A. M. Johnson, S. A. Burden, and D. E. Koditschek (2016) A hybrid systems model for simple manipulation and self-manipulation systems. The International Journal of Robotics Research 35 (11), pp. 1354–1392. Cited by: §2.1.
  • D. Katz and O. Brock (2011) A factorization approach to manipulation in unstructured environments. In Robotics Research, pp. 285–300. Cited by: §2.4.
  • M. Kawato (1999) Internal Models for Motor Control and Trajectory Planning. Current Opinion in Neurobiology (6), pp. 718–727. External Links: https://www.ncbi.nlm.nih.gov/pubmed/10607637 Cited by: §1.
  • C. C. Kemp, A. Edsinger, and E. Torres-Jara (2007) Challenges for robot manipulation in human environments [grand challenges of robotics]. IEEE Robotics & Automation Magazine 14 (1), pp. 20–29. Cited by: §2.4.
  • S. A. Khader, H. Yin, P. Falco, and D. Kragic (2020) Data-efficient model learning and prediction for contact-rich manipulation tasks. IEEE Robotics and Automation Letters 5 (3), pp. 4321–4328. Cited by: §4.4.
  • M. Kopicki, S. Zurek, R. Stolkin, T. Moerwald, and J. L. Wyatt (2017) Learning modular and transferable forward models of the motions of push manipulated objects. Autonomous Robots 41 (5), pp. 1061–1082. Cited by: §2.1.
  • A. Kramberger, E. Shahriari, A. Gams, B. Nemec, A. Ude, and S. Haddadin (2018) Passivity based iterative learning of admittance-coupled dynamic movement primitives for interaction with changing environments. In International Conference on Intelligent Robots and Systems, pp. 6023–6028. Cited by: §2.2.
  • O. Kroemer, S. Niekum, and G. Konidaris (2019) A review of robot learning for manipulation: challenges, representations, and algorithms. arXiv preprint arXiv:1907.03146. Cited by: §1, §2.1, §2.4.
  • K. Kronander and A. Billard (2014) Learning compliant manipulation through kinesthetic and tactile human-robot interaction. IEEE Transactions on Haptics 7 (3), pp. 367–380. Cited by: §2.1.
  • A. G. Kupcsik, M. P. Deisenroth, J. Peters, G. Neumann, et al. (2013) Data-efficient generalization of robot skills with contextual policy search. In AAAI 2013, pp. 1401–1407. Cited by: §2.1.
  • A. X. Lee, H. Lu, A. Gupta, S. Levine, and P. Abbeel (2015) Learning force-based manipulation of deformable objects from multiple demonstrations. In International Conference on Robotics and Automation, pp. 177–184. External Links: https://people.eecs.berkeley.edu/ pabbeel/papers/2015-ICRA-TPS-LfD-forces.pdf Cited by: §2.2.
  • S. Levine and V. Koltun (2013) Guided policy search.. In International Conference on Machine Learning, pp. 1–9. Cited by: §2.1.
  • C. K. Liu (2009) Dextrous manipulation from a grasping pose. In ACM Transactions on Graphics, Vol. 28, pp. 59. Cited by: §2.1.
  • G. Marth, T. J. Tarn, and A. K. Bejczy (1993)

    Stable phase transition control for robot arm motion

    .
    In [1993] Proceedings IEEE International Conference on Robotics and Automation, pp. 355–362. Cited by: §2.5.
  • M. J. Mathew, S. Sidhik, M. Sridharan, M. Azad, A. Hayashi, and J. Wyatt (2019) Online Learning of Feed-Forward Models for Task-Space Variable Impedance Control. In International Conference on Humanoid Robots, Cited by: §1, §2.2, §4.1.
  • J. K. Mills and D. M. Lokhorst (1993) Control of robotic manipulators during general task execution: a discontinuous control approach. The International Journal of Robotics Research 12 (2), pp. 146–163. Cited by: §2.5.
  • A. Nagabandi, K. Konolige, S. Levine, and V. Kumar (2020) Deep dynamics models for learning dexterous manipulation. In Conference on Robot Learning, pp. 1101–1112. Cited by: §2.1.
  • S. Nam and M. Yang (2004) A study on a generalized parametric interpolator with real-time jerk-limited acceleration. Computer-Aided Design 36 (1), pp. 27–36. Cited by: §2.5.
  • S. Niekum, S. Chitta, A. G. Barto, B. Marthi, and S. Osentoski (2013) Incremental semantically grounded learning from demonstration.. In Robotics: Science and Systems, Vol. 9, pp. 10–15607. Cited by: §2.3.
  • M. Parmar, M. Halm, and M. Posa (2021) Fundamental challenges in deep learning for stiff contact dynamics. arXiv preprint arXiv:2103.15406. Cited by: §2.4.
  • F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay (2011) Scikit-learn: machine learning in Python. Journal of Machine Learning Research 12, pp. 2825–2830. Cited by: §3.2.
  • C. Pezzato, R. Ferrari, and C. H. Corbato (2020) A novel adaptive controller for robot manipulators based on active inference. IEEE Robotics and Automation Letters 5 (2), pp. 2973–2980. Cited by: §2.2, §4.2.2.
  • A. Piazzi and A. Visioli (2000) Global minimum-jerk trajectory planning of robot manipulators. IEEE transactions on industrial electronics 47 (1), pp. 140–149. Cited by: §2.5.
  • J. M. Romano, K. Hsiao, G. Niemeyer, S. Chitta, and K. J. Kuchenbecker (2011) Human-inspired robotic grasp control with tactile sensing. IEEE Transactions on Robotics 27 (6), pp. 1067–1079. Cited by: §2.3.
  • L. Rozo, S. Calinon, D. G. Caldwell, P. Jimenez, and C. Torras (2016) Learning physical collaborative robot behaviors from human demonstrations. IEEE Transactions on Robotics 32 (3), pp. 513–527. Cited by: §2.2.
  • S. Sidhik, M. Sridharan, and D. Ruiken (2020) Learning hybrid models for variable impedance control of changing-contact manipulation tasks. In Annual Conference on Advances in Cognitive Systems (ACS), Vol. . Cited by: §1, §2.2, §2.5, §4.3, §4.3.
  • S. Sidhik, M. Sridharan, and D. Ruiken (2021) Towards a framework for changing-contact robot manipulation. In International Conference on Intelligent Robots and Systems (IROS), Cited by: §1, §4.5.
  • M. Song and H. Wang (2005) Highly efficient incremental estimation of Gaussian mixture models for online data stream clustering. In Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, K. L. Priddy (Ed.), Presented at the Society of Photo-Optical Instrumentation Engineers (SPIE) Conference, Vol. 5803, pp. 174–183. External Links: Document Cited by: §3.1.
  • D. Stoianovici and Y. Hurmuzlu (1996) A critical study of the applicability of rigid-body collision theory. Cited by: §2.4.
  • F. Stulp, E. A. Theodorou, and S. Schaal (2012) Reinforcement learning with sequences of motion primitives for robust manipulation. IEEE Transactions on robotics 28 (6), pp. 1360–1370. Cited by: §2.1.
  • H. G. Sung (2004) Gaussian mixture regression and classification. Ph.D. Thesis, Rice University. Cited by: §3.1.
  • M. Tarokh (1991) Hyperstability approach to the synthesis of adaptive controllers for robot manipulators. In Proceedings. 1991 IEEE International Conference on Robotics and Automation, Vol. , pp. 2154–2159 vol.3. External Links: Document Cited by: §4.2.1.
  • M. Toussaint, K. Allen, K. A. Smith, and J. B. Tenenbaum (2018) Differentiable physics and stable modes for tool-use and manipulation planning.. In Robotics: Science and Systems, Cited by: §2.1.
  • E. Ugur and J. Piater (2015) Bottom-up learning of object categories, action effects and logical rules: from continuous manipulative exploration to symbolic planning. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 2627–2633. Cited by: §2.4.
  • P. Wieber, R. Tedrake, and S. Kuindersma (2016) Modeling and control of legged robots. In Springer handbook of robotics, pp. 1203–1234. Cited by: §2.4.
  • C. Yang, G. Ganesh, S. Haddadin, S. Parusel, A. Albu-Schaeffer, and E. Burdet (2011) Human-like adaptation of force and impedance in stable and unstable interactions. IEEE transactions on robotics 27 (5), pp. 918–930. Cited by: §4.2.3, §4.2.3.
  • D. Zhang and B. Wei (2017) A review on model reference adaptive control of robotic manipulators. Annual Reviews in Control 43, pp. 188–198. Cited by: §2.2.
  • T. Zhang, R. Ramakrishnan, and M. Livny (1997) BIRCH: a new data clustering algorithm and its applications. Data Mining and Knowledge Discovery 1 (2), pp. 141–182. Cited by: §3.2.