I Introduction
The ability to handle unexpected sensor events is key to robustly executing manipulation tasks. Humans, for instance, can predict how it should feel to pick up an object and correct a grasp if the actual experience deviates from this prediction. Phrased differently, humans can map errors in sensory space to corrections in action space. In order to endow our robots with this ability, two problems need to be tackled: First, the system needs to be able to predict what sensor measurements to expect. Second, it needs to learn how to map deviations from those predictions to changes in actions.
Learning what sensor measurements to expect at any moment in time, anywhere in the state space, is a challenging problem with no known viable solution. However, associating sensor information with successful executions of motion primitives has been shown to be promising
[1, 2]. When such sensor traces have been associated with a primitive, the robot can try to correct the primitive’s nominal actions when the actual sensor readings deviate from what is expected.In order to do so, a feedback model that maps errors in sensor space to the corrective actions needs to be acquired.
In initial implementations of such Associative Skill Memories (ASMs) [1], a linear feedback model was used. This feedback model essentially multiplies the sensor trace error with a manually defined feedback gain matrix to compute acceleration changes. While handdesigning feedback models can work well for specific problem settings, this approach is not expected to generalize beyond the scenario it was tuned for. Furthermore, when considering highdimensional and multimodal sensory input, such as haptic feedback, manually designing a feedback policy quickly becomes infeasible. For example, in this work we consider tactiledriven manipulation with tools. Manipulation tasks involving tools is challenging due to inaccurate tool kinematics models and nonrigid contacts between tactile sensors and the tool.
Thus, the larger goal of this research is to equip Associative Skill Memories with a general feedback modulation learning framework, as depicted in the block diagram in Figure 1. Data driven approaches to learning such feedback models have been proposed [3, 4, 5] in the past. Here, we present a learning framework that improves such datadriven approaches in generality and experimental validation. First we contribute towards the goal of generality by proposing the use of phasemodulated neural networks (PMNNs). Our previous work [4] shows that feedforward neural networks (FFNNs) have greater flexibility to learn feedback policies from human demonstrations than a handdesigned model. However, FFNNs cannot capture phasedependent sensory features or corrective actions. Thus, in this paper, we introduce (PMNNs), which can learn phasedependent feedback models and show that this improves learning performance when compared to regular FFNNs. Second, we present detailed insight on our experimental pipeline for learning feedback models on a tactiledriven manipulation task. Furthermore, we extensively evaluate our learning approach on this manipulation task across multiple task variations and successfully deploy our approach on a real robot.
This paper is organized as follows. Section II provides some background on the motion primitive representation and related work. Section III presents the details of our approach for learning feedback models from demonstrations. We then present insights into our experimental setup in Section IV. Finally, we evaluate our approach in Section V and conclude with Section VI.
Ii Background and Related Work
Here we review background material on our chosen motion primitive representation and related work in learning feedback model approaches, including tactile feedback learning.
Iia Quaternion DMPs
The Associative Skill Memories framework, as proposed in [2], uses Dynamic Movement Primitives (DMPs) [6] as a motion primitive representation. DMPs are a goaldirected behavior described as a set of differential equations with welldefined attractor dynamics. It is this formulation of DMPs as a set of differential equations that allows for online modulation from various inputs, such as sensor traces, in a manner that is conceptually straight forward and simple to implement, relative to other movement representations.
In our work, DMPs need to represent both position and orientation of the endeffector. We refer the reader to [4] for our position DMP formulation. Here we focus on reviewing Quaternion DMPs, which we use for orientation representation in our learningfromdemonstration experiments.
Quaternion DMPs were first introduced in [1], and then improved in [7, 8] to fully take into account the geometry of SO(3). Like position DMPs, they consist of a transformation system and a canonical system, governing the evolution of the orientation state and movement phase, respectively.
The transformation system of a quaternion DMP is^{1}^{1}1For defining Quaternion DMPs, the operators , and the generalized log and exponential maps , and are required. The definition of these operators are stated in Equations 13, 14, 15, and 16 in the Appendix.:
(1) 
where is a unit quaternion representing the orientation, is the goal orientation and are the 3D angular velocity and angular acceleration, respectively. and are the 3D orientation forcing term and feedback/coupling term^{2}^{2}2Throughout this paper, we use the term feedback and the term coupling term interchangeably., respectively. The forcing term encodes the nominal behavior, while the coupling term encodes behavior adaptation which is commonly based on sensory feedback. In this paper, we focus on learning a feedback model that generates the coupling term, which is described in SubSection IIIB. During unrolling, we integrate forward in time to generate the kinematic orientation trajectory as follows:
(2) 
where is the integration step size. We set the constants and to get a criticallydamped system response when both forcing term and coupling term are zero. is set proportional to the motion duration.
The movement phase variable and phase velocity are governed by the secondorder canonical system as follows:
(3) 
(4) 
We set the constants and . The phase variable is initialized with 1 and will converge to 0. On the other hand, the phase velocity
has initial value 0 and will converge to 0. Note, for a multi degreeoffreedom (DOF) system, each DOF has its own transformation system, but all DOFs share the same canonical system
[6].The forcing term governs the shape of the primitive and is represented as a weighted combination of basis functions with width parameter and center at , as follows:
(5) 
where
(6) 
Note, because the forcing term is modulated by the phase velocity , it is initially and will converge back to .
The basis function weights in equation 5 are learned from human demonstrations of baseline/nominal behaviors, by setting the target regression variable:
where {} is the set of baseline/nominal
orientation behavior demonstrations. Then we can perform linear regression to identify parameters
, as shown in [6].Finally, we include a goal evolution system as follows:
(7) 
where and are the evolving and steadystate goal orientation, respectively. We set the constant . The goal evolution system has two important roles related to safety during the algorithm deployment on robot hardware. The first role, as mentioned in [6], is to avoid discontinuous jumps in accelerations when the goal is suddenly moved. The second role, as mentioned in [9], is to ensure continuity between the state at the end of one primitive and the state at the start of the next one when executing a sequence of primitives. Here we ensure continuity between primitives for both position and orientation DMPs by adopting [9].
IiB Related Work on Learning Feedback Models
The ability to adapt movement plans to changes in the environment requires feedback models. In previous work, researchers have handdesigned feedback models for specific purposes. For instance, [10, 11] devised feedback models for obstacle avoidance. [12] designed a humaninspired feedback model for performing robotic surfacetosurface contact alignment based on forcetorque sensing. Forcetorque sensing is also used in [1], where a handdesigned feedback gain matrix maps deviations from the expected forcetorque measurements to the grasp plan adaptation.
Previous work on robotic tactiledriven manipulation with tools has tried to learn feedback models to correct the position plans for handling uncertainty between tools and the environment, via reinforcement learning
[5] or motor babbling [13]. In our work, we propose to bootstrap the learning of feedback model from human demonstrations.AbuDakka et al. [14] iteratively learned feedforward terms to improve a forcetorqueguided task execution over trials, while fixing feedback models as constant gain matrices.
Learning by demonstrations is also employed in [15]
to train separate feedback models for different environmental settings. Gaussian process regression is used to interpolate between these learned models to predict the required feedback model in a new environmental setting. Our work, directly uses a single model to handle multiple settings.
Kupcsik et al. [16] learns the mapping from contexts –or environmental settings– to DMP parameters. On the other hand, we learn the mapping from sensory input to the plan adaptation, abstracting the prespecification of the context.
In [17]
, a partiallyobservable Markov decision process (POMDP), which is parameterized by deep recurrent neural networks, is used to represent a haptic feedback model. In general, POMDPs models are not explicitly provided with the information of the movement phase which is essential for making prediction on the next corrective action. Our proposed approach, can learn phasedependent corrective actions.
Iii Learning Feedback Models via PhaseModulated Neural Networks
In this section we describe our framework to learn general feedback models from human demonstrations. The process pipeline of learning feedback models is visualized in Figure 2. For a specific instance of this pipeline in our experiment, please refer to SubSection IVC. Our framework comprises 3 core components: learning expected sensor traces; learning the feedback model to map sensor trace errors to corrections; and finally we introduce PMNNs, a feedback model representation that is flexible enough to capture phasedependent features and can learn across multiple task settings.
Iiia Learning Expected Sensor Traces
The core idea of ASMs [1], [2] rests on the insight that similar task executions should yield similar sensory events. Thus, an ASM of a task includes both a movement primitive as well as the expected sensor traces associated with this primitive’s execution in a known environment. We term this execution as the primitive’s nominal behavior, the known environment as the nominal setting, and the expected sensor traces as . To learn the model, we execute the nominal behavior and collect the experienced sensor measurements. Since these measurements are trajectories by nature, we can encode them using DMPs to become . This has the advantage that is phasealigned with the position and Quaternion DMP’s execution, because they all share the same canonical system in Equations 3 and 4.
IiiB Learning Feedback Models from Demonstration
When a movement primitive is executed under environment variations and/or uncertainties, the perceived sensor traces, denoted as actual sensor traces , tend to deviate from . The disparity can be used to drive corrections for adapting to the environmental changes causing the deviated sensor traces. Previous work [5, 18]
uses reinforcement learning to learn these corrective behaviors, also in form of feedback models. However, learning a good feedback policy via trialanderror from scratch is a very slow process. Therefore, we would like to bootstrap this process by learning feedback models from demonstrations. In our supervised learning framework, the disparity
is used as the input to a feedback model, mapping them to the motion plan adaptation or the coupling terms (from Equation 1), as follows:(8) 
We pose this as a regression problem, and similar to learning the nominal behavior, we can also learn this feedback model from human demonstrations of corrected behavior
, i.e. the demonstrated behavior when the feedback is active. To perform the learningfromdemonstration, we need to extract the target output variable, i.e. the target coupling term
, from demonstrations data, which can be done as follows:(9) 
where {} is the set of corrected orientation behavior demonstration. Next, we describe our proposed general learning representation for the feedback model.
IiiC PhaseModulated Neural Network Structure
We use neural network (NN) structures for representing feedback term models due to its ability to learn taskrelevant feature representations of highdimensional inputs from data. In this paper, we improve upon our previous work [4], in which we used a regular fullyconnected feedforward neural network (FFNN) to represent the feedback model. Our new neural network design is a variant of the radial basis function network (RBFN) [19], which we call the phasemodulated neural networks (PMNNs) as depicted in Figure 3. PMNN has an embedded structure that allows the encoding of a feedback model’s dependency on the movement phase, which a FFNN structure lacks. We expect PMNN to model human adaptation better than FFNN because the same sensory deviation (NN input) may occur at different movement phases, but the form of the adaptation (NN output) will most likely be different. There is also an alternative way of modeling phasedependent adaptation behavior by using FFNN and including both phase variable and phase velocity as inputs, together with the sensor trace deviations . However, there is no convergence guarantee on the adapted motion plan because the coupling term is not guaranteed to converge to zero, hence we may still need to handdesign an output postprocessing similar to [4] to ensure convergence. PMNN, on the other hand, guarantees convergence due to the way we embed the information of phase velocity into the structure.
The PMNN consists of:

input layer
The input is . 
regular hidden layers
The regular hidden layers perform nonlinear feature transformations on the highdimensional inputs. If there are layers, the output of th layer is:is the activation function of the
th hidden layer, which can be tanh, ReLU, or others. is the weight matrix between the input layer and the first hidden layer. is the weight matrix between the th hidden layer and the th hidden layer.is the bias vector at the
th hidden layer. 
final hidden layer with phase kernel modulation
This special and final hidden layer takes care of the dependency of the model on the movement phase. The output of this layer is , which is defined as:(10) where denote elementwise product of vectors. is the phase kernel modulation vector, and each component is defined as:
(11) with phase variable and phase velocity , which comes from the secondorder canonical system defined in Equation 3 and 4. is the radial basis function (RBF) as defined in Equation 6. We use phase RBF kernels both in the PMNNs as well as in the DMPs representation. The phase kernel centers have equal spacing in time, and we place these centers in the same way in the DMPs as well as in the PMNNs.

output layer
The output is the onedimensional coupling term :(12) is the weight vector. Please note that there is no bias introduced in the output layer, and hence if –which occurs when the phase velocity is zero– then is also zero. This ensures that is initially zero when a primitive is started. will also converge to zero because the phase velocity is converging to zero. This ensures the convergence of the adapted motion plan.
For an dimensional coupling term, we use separate PMNNs with the same input vector and the output of each PMNN corresponds to each dimension of the coupling term. This separation allows each network to be optimized independently from each other.
We implemented PMNN
in TensorFlow
[20]. To avoid overfitting, we used the dropout technique as introduced in [21].
Iv Learning Tactile Feedback Models:
System Overview and Experimental Setup
This work is focused on learning to correct tactiledriven manipulation with tools. Our experimental scenario involves a demonstrator teaching our robot to perform a scraping task, utilizing a handheld tool to scrape paint off the surface of a dryerase board (see Figure 4). The system is taught this skill at a nominal tilt angle, and needs to correct when the board is tilted away from that default angle. Neither vision nor motion capture system is used, thus we only rely on tactile sensing to inform the correction. One of the main challenges is that the tactile sensors interact indirectly with the board, i.e. through the tool adapter and the scraping tool via a nonrigid contact, and the robot does not explicitly encode the tool kinematics model. This makes handdesigning a feedback gain matrix difficult. Next, we explain the experimental setup and some lessons learned from the experiments.
Iva Hardware
The demonstrations were performed on the right arm and the right hand of our bimanual robot. The arm is a 7degreesoffreedom (DoF) Barrett WAM arm which is also equipped with a 6D forcetorque (FT) sensor at the wrist. The hand is a Barrett hand whose left and right fingers are equipped with biomimetic tactile sensors (BioTacs) [22]. The two BioTacequipped fingers were setup to perform a pinch grasp on a tool adapter. The tool adapter is a 3Dprinted object designed to hold a scraping tool with an 11mmwide tooltip.
The dryerase board was mounted on a tilt stage whose orientation can be adjusted to create static tilts of in roll and/or pitch with respect to the robot global coordinates as shown in Figure 4. Two digital protractors with resolution (Wixey WR 300 Digital Angle Gauge) were used to measure the tilt angles during the experiment.
IvB Robot’s Environmental Settings and Human Demonstrations with Sensory Traces Association
For our experiment, we considered 5 different settings, and each setting is associated with a specific roll angle of the tilt stage, specifically at , , , , and . At each setting, we fixed the pitch angle at and maintain the scraping path to be roughly at the same height. Hence, we assume that among the 6D pose action (xyzpitchrollyaw), the necessary correction is only in the rollorientation. For each setting, we collected 15 demonstrations. The setting with roll angle at is selected as the nominal setting, while the remaining settings become the corrected ones.
For the demonstrated actions, we recorded the 6D pose trajectory of the right hand endeffector at 300 Hz rate, and along with these demonstrations, we also recorded the multidimensional sensory traces associated with this action. The sensory traces are the 38dimensional tactile signals from the left and right BioTacs’ electrodes, sampled at 100 Hz.
IvC Learning Pipeline Details and Lessons Learned
DMPs provide kinematic plans to be tracked with a position control scheme. However, for tactiledriven contact manipulation tasks such as the scraping task in this paper, using position control alone is not sufficient. In order to attain consistent tactile signals on task repetitions –during the demonstrations as well as during unrolling of the learned feedback models– similar contact force profiles needs to be applied. Hence force control is required.
Moreover, while it is possible to perform corrected demonstrations solely by humans, the sensor traces obtained might be significantly different from the traces obtained during the robot’s execution of the motion plan. This is problematic, because during learning and during prediction phases of the feedback terms, the input to the feedback models are different. Hence, instead we try to let the robot execute the nominal plans, and only provide correction by manually adjusting the robot’s execution at different settings as necessary.
Therefore, we use the forcetorque (FT) sensor in the robot’s right wrist for FT control, with two purposes: (1) to maintain tooltip contact with the board, such that consistent tactile signals are obtained, and (2) to provide compliance, allowing the human demonstrator to perform corrective action demonstration as the robot executes the nominal behavior.
For simplicity, we set the force control set points in our experiment to be constant. We need to set the force control set point carefully: if the downward force (in the zaxis direction) for contact maintenance is too big, the friction will block the robot from being able to execute the corrections as commanded by the feedback model. We found that 1 Newton is a reasonable value for the downward force control set point. Regarding the learning process pipeline as depicted in Figure 2, here we provide the details in our experiment:

Nominal primitives acquisition: While the robot is operating in the gravitycompensation mode and the tilt stage is at roll angle, the human demonstrator guided the robot’s hand to kinesthetically perform a scraping task, which can be divided into three stages, each of which corresponds to a movement primitive:

[(a)]

primitive 1: starting from its home position above the board, go down (in the zaxis direction) until the scraping tool made contact with the scraping board’s surface (no orientation correction at this stage),

primitive 2: correct the tooltip orientation such that it made a full flat tooltip contact with the surface,

primitive 3: go forward in the yaxis direction while scraping paint off the surface, applying orientation correction as necessary to maintain full flat tooltip contact with the surface.
We used Zero Velocity Crossing (ZVC) method [23] and local minima search refinement on the velocity signal in the z and y axes, to find segmentation points of primitives 1 and 3, respectively. The remaining part – between the end of primitives 1 and the beginning of primitive 3 – becomes primitive 2. We encode each of these primitives with position and orientation DMPs.
ForceTorque Control Activation Schedule Primitive 1 Primitive 2 Primitive 3 Step 2  z 1 N z 1 N Step 3  z 1 N, roll 0 Nm z 1 N, roll 0 Nm Step 4  z 1 N z 1 N TABLE I: Forcetorque control schedule for steps 24. For the following pipeline steps (2, 3, and 4), in reference to Table I, which indicates what forcetorque control mode being active at each primitive of these steps. ”z 1 N” refers to the 1 Newton downward zaxis proportionalintegral (PI) force control, for making sure that consistent tactile signals are obtained at repetitions of the task; this is important for learning and making correction predictions properly. ”roll 0 Nm” refers to the rollorientation PI torque control at 0 Newtonmeter, for allowing corrective action demonstration.


Expected sensor traces acquisition: Still with the tilt stage at roll angle, we unroll the nominal primitives 15 times and record the tactile sensor traces. We encode each dimension of the 38dimensional sensor traces as , using the standard DMP formulation.
Fig. 5: (Left) comparison of regression results on primitives 2 and 3 using different neural network structures; (Middle) comparison of regression results on primitives 2 and 3 using separated feature learning (PCA or Autoencoder and phase kernel modulation) versus embedded feature learning (
PMNN); (Right) the top 10 dominant regular hidden layer features for each phase RBF in primitive 2, rollorientation coupling term, displayed in yellow. 
Feedback model learning: Now we vary the tilt stage’s rollangle to , , , and , oneatatime, to encode different environmental settings. At each setting, we let the robot unroll the nominal behavior. Beside the downward force control for contact maintenance, now we also activate the rollorientation PI torque control at 0 Newtonmeter throughout primitives 2 and 3. This allows the human demonstrator to perform the rollorientation correction demonstration, to maintain full flat tooltip contact relative to the nowtilted scraping board. We recorded 15 demonstrations for each setting, from which we extracted the supervised dataset for the feedback model, i.e. the pair of the sensory trace deviation and the target coupling term as formulated in Equation 9. Afterwards, we learn the feedback models from this dataset using the PMNN.

DMP and Feedback Model Unrolling/Testing: We test the feedback models on different settings on the robot.
V Experiments
To evaluate the performance of the learned feedback model, we first evaluate the regression and generalization ability of the PMNNs which were trained offline on the demonstration data. Second, we show the superiority of PMNNs over FFNNs as a choice for feedback models learning representation. Third, we investigate the importance of learning both the feature representation and the phase dependencies together within the framework of learning feedback models. Fourth, we show the significance of the phase modulation in the feedback model learning. Finally, we evaluate the learned feedback model’s performance in making predictions of action corrections online on a real robot.
We evaluate feedback models only on primitives 2 and 3, for rollorientation correction. In primitive 1, we deem that there is no action correction, because the height of the dryerase board surface is maintained constant across all settings.
As error metric we use the normalized mean squared error (NMSE), i.e. the mean squared prediction error divided by the target coupling term’s variance. To evaluate the learning performance of each model in our experiments, we perform a
leaveonedemonstrationout test. In this test, we perform iterations of training and testing, where is the number of demonstrations per setting. At the th iteration:
The data points of the th demonstration of all settings are leftout as unseen data for generalization testing, while the remaining demonstrations’ data points^{3}^{3}3Each demonstration – depending on the data collection sampling rate and demonstration duration – provides hundreds or thousands of data points. are shuffled randomly and split , , and for training, validation, and testing, respectively.

We record the trainingvalidationtestinggeneralization NMSE pairs corresponding to the lowest generalization NMSE across learning steps.
We report the mean and standard deviation of trainingvalidationtestinggeneralization NMSEs across
iterations.On all models we evaluated, we use tanh
as the activation function of the hidden layer nodes. We use the Root Mean Square Propagation (RMSProp)
[24] as the gradient descent optimization algorithm and set the dropout [21] rate to 0.5.Va Fitting and Generalization Evaluation of Pmnns
The results for primitive 2 and 3, using the PMNN structure with one regular hidden layer of 100 nodes, are shown in Table II. The PMNNs achieve good training, validation, testing results, and reasonable generalization results for both primitives.
RollOrientation Coupling Term Learning NMSE  
Training  Validation  Testing  Generalization  
Prim. 2  0.150.05  0.150.05  0.160.06  0.360.19 
Prim. 3  0.220.05  0.220.05  0.220.05  0.320.13 

VB Performance Comparison between FFNN and Pmnn
We compare the performance between FFNN and PMNN. For PMNN, we test two structures: one with no regular hidden layer being used, and the other with one regular hidden layer comprised of 100 nodes. For FFNN, we use two hidden layers with 100 and 25 nodes each, which is equivalent to PMNN with one regular hidden layer of 100 nodes but deactivating the phase modulation. The results can be seen in Figure 5 (Left). It can be seen that PMNN with one regular hidden layer of 100 nodes demonstrated the best performance compared to the other structures. PMNN with one regular hidden layer is better than the one without regular hidden layer, most likely because of the richer learned feature representation, without getting overfitted to the data.

VC Comparison between Separated versus Embedded Feature Representation and PhaseDependent Learning
We also compare the effect of separating versus embedding the feature representation learning with overall parameter optimization under phase modulation. Chebotar et al. [5] used PCA for feature representation learning, which was separated from the phasedependent parameter optimization using reinforcement learning. On the other hand, PMNN embeds feature learning together with the parameter optimization under phase modulation, into an integrated process.
In this experiment, we used PCA which retained 99% of the overall data variance, reducing the data dimensionality to 7 and 6 (from originally 38) for primitive 2 and 3, respectively. In addition, we also implemented an autoencoder, a nonlinear dimensionality reduction method, as a substitute for PCA in representation learning. The dimensions of the latent space of the autoencoders were 7 and 6 for primitive 2 and 3, respectively. For PMNNs, we used two kinds of networks: one with one regular hidden layer of 6 nodes (such that it is become comparable with the PCA counterpart), and the other with one regular hidden layer of 100 nodes.
Figure 5 (Middle) illustrates the superior performance of PMNNs, due to the feature learning performed together with the phasedependent parameter optimization. Of the two PMNNs, the one with more nodes in the regular hidden layer performs better, because it can more accurately represent the mapping, while not overfitting to the data. Based on these evaluations, we decided to use PMNNs with one regular hidden layer of 100 nodes and 25 phasemodulated nodes in the final hidden layer for subsequent experiments.
VD Evaluation of Movement Phase Dependency
Here we visualize the trained weight matrix mapping the output of 100 nodes in the regular hidden layer to the 25 nodes in the final hidden layer being modulated by the phase RBFs. This weight matrix is of dimension 25 100, and each row shows how each of the 100 nodes’ output (or ”features”) in the regular hidden layer being weighted into a particular phase RBFmodulated node. In Figure 5 (Right), we display the top 10 dominant regular hidden layer node output for each phase RBFmodulated node (in yellow color), and the rest (colored in blue) are the less dominant ones. We see that between different phase RBFmodulated nodes, the priority ranking is different, suggesting that there is some dependency of the feedback on the movement phase.
VE Unrolling the Learned Feedback Model on the Robot
In Figure (d)d, we show the snapshots of our robot scraping experiment on a setting with rollangle of the tilt stage. In particular, we compare between the nominal plan execution (top figures, from (a) to (d)) and the adapted plan execution (bottom figures, from (e) to (h), using the trained feedback models). From left to right ((a) to (d), and (e) to (h)), it shows subsequent phases of plan execution. The caption ((a) to (h)) shows the reading of the Digital Angle Gauge mounted on top of the middle finger of the hand. We see that if we turn off the coupling term (nominal plan execution, top figures), there was no correction applied to the tooltip orientation and the scraping result was worse than when the online adaptation was applied (adapted plan execution, bottom figures).
Figure 14 shows the coupling term (top) alongside the corresponding sensor trace deviation of one of the electrodes (bottom) during plan execution at 4 different environmental settings as specified in caption (a)(d). We compare between several cases: human demonstrations (blue), human demonstrations’ mean trajectory (dashed black), range of demonstrations within 1 standard deviation from the mean trajectory (solid black), during robot unrolling of the nominal behavior (green), and during robot unrolling while applying the coupling term computed online by the trained feedback model (red). On the top plots, we see that the trained feedback model can differentiate between settings and apply the approximately correct amount of correction. When applying the coupling term computed online by the trained feedback model, the sensor trace deviation is also close to those of demonstrations, as shown in the bottom plots.
Finally, video https://youtu.be/7Dx5imy1Kcw shows the scraping execution at two settings, at and rollangle of the tilt stage, while applying the corrections predicted online by the trained feedback model.
Vi Conclusion
We introduced a general framework for learningfromdemonstration of feedback models, mapping sensory trace deviations to action corrections. In particular, we introduced phasemodulated neural networks (PMNNs), which allow to fit phasedependent feedback models and preserve the convergence properties of DMPs. Finally, we demonstrate the superior learning performance of our PMNNbased framework when compared to stateoftheart methods, as well as its capability in performing online adaptation on a real robot.
Appendix
Unit quaternion is a hypercomplex number which can be written as a vector , such that with and are the real scalar and the vector of three imaginary components of the quaternions, respectively. For computation with orientation trajectory, several operations needs to be defined as follows:

quaternion composition operation:
(13) 
quaternion conjugation operation:
(14) 
logarithm mapping ( operation), which maps an element of SO(3) to so(3), is defined as:
(15) 
exponential mapping ( operation, the inverse of operation) maps an element of so(3) to SO(3):
(16)
Acknowledgment
We thanked Gerald E. Loeb for the support on BioTac sensors, Oliver Kroemer for the scraping task testbed suggestion, as well as Ludovic Righetti, Vincent Enachescu, and Ryan Julian for reviewing initial drafts of the paper.
References
 [1] P. Pastor, L. Righetti, M. Kalakrishnan, and S. Schaal, “Online movement adaptation based on previous sensor experiences,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2011, pp. 365–371.
 [2] P. Pastor, M. Kalakrishnan, F. Meier, F. Stulp, J. Buchli, E. Theodorou, and S. Schaal, “From dynamic movement primitives to associative skill memories,” Robotics and Autonomous Systems, vol. 61, no. 4, pp. 351–361, 2013.
 [3] A. Rai, F. Meier, A. Ijspeert, and S. Schaal, “Learning coupling terms for obstacle avoidance,” in IEEERAS International Conference on Humanoid Robots, 2014, pp. 512–518.
 [4] A. Rai, G. Sutanto, S. Schaal, and F. Meier, “Learning feedback terms for reactive planning and control,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2017.
 [5] Y. Chebotar, O. Kroemer, and J. Peters, “Learning robot tactile sensing for object manipulation,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2014, pp. 3368–3375.
 [6] A. J. Ijspeert, J. Nakanishi, H. Hoffmann, P. Pastor, and S. Schaal, “Dynamical movement primitives: Learning attractor models for motor behaviors,” Neural Comput., vol. 25, no. 2, pp. 328–373, 2013.
 [7] A. Kramberger, A. Gams, B. Nemec, and A. Ude, “Generalization of orientational motion in unit quaternion space,” in IEEERAS International Conference on Humanoid Robots, 2016, pp. 808–813.
 [8] A. Ude, B. Nemec, T. Petric, and J. Morimoto, “Orientation in cartesian space dynamic movement primitives,” in IEEE International Conference on Robotics and Automation, 2014, pp. 2997–3004.
 [9] B. Nemec and A. Ude, “Action sequencing using dynamic movement primitives,” Robotica, vol. 30, no. 05, pp. 837–846, 2012.
 [10] D.H. Park, H. Hoffmann, P. Pastor, and S. Schaal, “Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields,” in IEEE International Conference on Humanoid Robots, 2008, pp. 91–98.
 [11] H. Hoffmann, P. Pastor, D. H. Park, and S. Schaal, “Biologicallyinspired dynamical systems for movement generation: Automatic realtime goal adaptation and obstacle avoidance,” in IEEE International Conference on Robotics and Automation, 2009, pp. 2587–2592.
 [12] M. Khansari, E. Klingbeil, and O. Khatib, “Adaptive humaninspired compliant contact primitives to perform surface–surface contact under uncertainty,” The International Journal of Robotics Research, vol. 35, no. 13, pp. 1651–1675, 2016.
 [13] H. Hoffmann, Z. Chen, D. Earl, D. Mitchell, B. Salemi, and J. Sinapov, “Adaptive robotic tool use under variable grasps,” Robotics and Autonomous Systems, vol. 62, no. 6, pp. 833–846, 2014.
 [14] F. J. AbuDakka, B. Nemec, J. A. Jørgensen, T. R. Savarimuthu, N. Krüger, and A. Ude, “Adaptation of manipulation skills in physical contact with the environment to reference force profiles,” Autonomous Robots, vol. 39, no. 2, pp. 199–217, Aug 2015.
 [15] A. Gams, M. Denisa, and A. Ude, “Learning of parametric coupling terms for robotenvironment interaction,” in IEEE International Conference on Humanoid Robots, 2015, pp. 304–309.
 [16] A. Kupcsik, M. Deisenroth, J. Peters, L. Ai Poh, V. Vadakkepat, and G. Neumann, “Modelbased contextual policy search for dataefficient generalization of robot skills,” Artificial Intelligence, vol. 247, pp. 415–439, 2017.
 [17] J. Sung, J. K. Salisbury, and A. Saxena, “Learning to represent haptic feedback for partiallyobservable tasks,” in IEEE International Conference on Robotics and Automation, 2017, pp. 2802–2809.
 [18] J. Kober, B. Mohler, and J. Peters, “Learning perceptual coupling for motor primitives,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2008, pp. 834–839.
 [19] C. Bishop, “Improving the generalization properties of radial basis function neural networks,” Neural Computation, vol. 3, no. 4, pp. 579–588, 1991.

[20]
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “Tensorflow: Largescale machine learning on heterogeneous distributed systems,” 2015. [Online]. Available:
http://download.tensorflow.org/paper/whitepaper2015.pdf  [21] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, 2014.
 [22] N. Wettels, V. Santos, R. Johansson, and G. Loeb, “Biomimetic tactile sensor array.” Advanced Robotics, vol. 22, no. 8, pp. 829–849, 2008.
 [23] A. Fod, M. J. Matarić, and O. C. Jenkins, “Automated derivation of primitives for movement classification,” Autonomous robots, vol. 12, no. 1, pp. 39–54, 2002.
 [24] T. Tieleman and G. Hinton, “Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude,” COURSERA: Neural Networks for Machine Learning, 2012.
Comments
There are no comments yet.