Robot imitation is a large area of study in robotics, which focuses on how a robot can learn an action based on user demonstrations. Popular frameworks for robot imitation typically record the trajectories the robot or human performs during demonstrations, successfully achieving reproduction of the average trajectory by the robot end-effector in Cartesian space. The most prominent examples of these frameworks are Programming by Demonstration  and Dynamic Motion Primitives . These algorithms shine for actions that are governed by geometry, such performing gestures in the air, or performing simple manipulation tasks of moving an object from A to B. Additional works have been performed to introduce active compliance  and obstacle avoidance .
However, a large body of actions that cannot be described solely in terms of human or robot geometric trajectories exists. In addition to joint or Cartesian positions, visual and force features provide relevant information when describing actions such as painting or ironing. Regarding visual features, recent studies have focused on learning end-to-end mappings directly from raw images to the robot joint space . These works can involve large sets of images for pre-training, robot-environment physical interaction, and additional hours for training.
Continuous Goal-Directed Actions (CGDA) is a feature-agnostic robot imitation framework 
. Actions are encoded as time series of the variation of scalar features extracted from sensor data during user demonstrations. While this framework provides a rich infrastructure for generalizing actions, this advantage comes at a cost. Final robot joint or end-effector Cartesian trajectories are not necessarily encoded within the model. Their components may have to be completely recomputed in order to comply with additional goals such as vision or force, or may be discarded manually or through automatic feature selection algorithms. This recomputation is studied as an optimization problem, and has been solved through evolutionary algorithms in simulated environments. CGDA requires no previous interaction with the environment, or additional training times.
In this paper, the online evolutionary strategy paradigm for CGDA execution is presented. The following contributions and consequences result from this change of paradigm.
Motor execution has been shifted and merged into the CGDA planning loop, enabling online adaptation for changing environments.
We demonstrate that the total time dedicated to mental simulation processes between motor executions is no longer dependent on the duration of the action.
The order of magnitude of results has been reduced from minutes with Incrementally Evolved Trajectories (IET)  to seconds with the presented online evolutionary strategy, Online Evolved Trajectories (OET).
The “paint” action, Figure 1, has been used to evaluate OET for a pure visual feature action, and an “iron” action was used for kinesthetic and force features.
Ii CGDA Framework and Strategies
CGDA is a framework for generalizing, recognizing and executing actions based on scalar features extracted from sensor data. In CGDA, an action is modelled as a trajectory in a feature space of scalar features, to represent the changes it produces on the environment. Scalar features used in CGDA, in addition to the geometric trajectory of a specific robot or human configuration, may include visual features of the environment, forces exerted by a given actuator, or even Cartesian positions of moving objects in the environment. Achieving a state of the environment in which the scalar features extracted from sensor perception match those of a given modelled action is studied as an optimization problem in the execution stage. The features are used as constraints to compute or recompute robot joint trajectories.
In CGDA not only the goal set of features, but also intermediate goals, must be achieved. An action is sliced into intermediate goals, computed as , where is the average duration of user demonstrations, and is the minimum time interval between intermediate goals. The generalized representation of an action is a trajectory in the -dimensional feature space with intermediate goals as defined in (1).
Recognition of an action is performed by comparing an observed action with a generalized action . The discrepancy metric used is the sum of costs of aligning each feature cost matrix . Each cost matrix is computed as within Dynamic Time Warping  as in (3).
Let be the duration of the observed action, is computed as .
For execution, evolutionary algorithms are used to compute robot joint trajectories. Three different strategies for CGDA execution have been previously proposed: Full Trajectory Evolution (FTE), Individual Evolution (IE), and Incrementally Evolved Trajectories (IET) .
In FTE, Algorithm 1, each of the population is composed by parameters, where
is the number of used degrees of freedom of the robot. The full robot joint trajectoryis generated attempting to reach all the intermediate goals simultaneously. Execution and recognition are performed in an internal “mental” simulation, where fitness is the recognition discrepancy. Termination conditions are evaluated a maximum of times, while additionally monitoring the evolution of .
In IE, Algorithm 2, each is composed by parameters. Joint positions are generated independently for each intermediate goal.
In IET, Algorithm 3, joint positions are generated for each intermediate goal after the mental execution of .
Experimental evidence from previous publications has determined FTE to be the strategy that requires most evaluations for fitness convergence . The main intuition behind this large amount of required evaluations is that evolutionary algorithms are greatly affected by the size of the search space. In FTE, the search space is -dimensional, which is proportional to the number of intermediate goals.
IE is the strategy that requires least evaluations for fitness convergence of the three presented strategies, with a -dimensional search space. However, joint positions are generated independently for each intermediate goal, which leads to an inherent issue. In the case of final intermediate goals, this means accomplishing the majority of a final goal with a single robot joint position. Let a “paint” action be the use case, accomplishing this is not realistic. Fitness convergence may result in the same robot joint position for two or more different intermediate goals. This is not only a duplicate effect, but also represents a step loss or loss of time to achieve a different goal contributing to the general solution.
In IET, the robot joint trajectory that has been computed to achieve the previous intermediate goals is executed in the simulation before generating each new robot joint position. This provides awareness of the previously achieved intermediate goals, avoiding the inherent issue described for IE. The search space is -dimensional as in IE.
Iii Online Evolutionary Strategies
In the previously presented CGDA execution strategies, there was a mental process of execution and recognition in a simulated environment while monitoring fitness evolution, and finally motor execution was performed. In this sense, they can be considered offline planning algorithms. The general layout of an offline CGDA execution evolutionary strategy is summarized in Algorithm 4, where planning termination conditions encompass all the loop conditions.
This paper presents a new layout for CGDA execution evolutionary strategies, namely online evolutionary strategies. The general layout of an online CGDA execution evolutionary strategy is summarized in Algorithm 5.
Once per intermediate goal.
After a single mental process loop.
The consequences are, respectively:
Movements should occur times.
The repetitions of mental process loops between motor executions is reduced by a factor of .
A further consequence of (2) is that the total time dedicated to mental processes between motor executions is no longer dependent of , and is therefore independent of the duration of the action.
Iv The OET Algorithm
Online Evolved Trajectories (OET) is presented in this paper as an evolutionary strategy to effectively reduce computation times for execution inside the CGDA framework for real world applications. OET is a concrete implementation of an online evolutionary strategy for real world applications within the CGDA framework. The pseudcode of this strategy is presented in Algorithm 6.
OET termination conditions are evaluated a maximum of times, while additionally monitoring that the final goal has not been achieved. To achieve introducing motor execution within the planning algorithm loop, real world sensor perception and localization steps are additionally performed.
Iv-a Sensor Perception
In the Sensor Perception step, the system extracts the scalar features from the real world environment sensor data. An updated vectorin the -dimensional feature space of is obtained from the current state of the world, at time , as in (4).
In the Localization step, the features extracted from the Sensor Perception step are used to locate the intermediate goal that corresponds with the current environment state. The objective of this step is to find the intermediate goal of the feature trajectory that reduces the discrepancy between and as in equation (5).
Where is the index of the previously accomplished intermediate goal, and is the order of the norm used for Localization, preferably the Euclidean L2 norm.
Three different evolutionary strategies for CGDA were tested in the experiments of this paper: Full Trajectory Evolution (FTE), Incrementally Evolved Trajectories (IET), and the Online Evolved Trajectories (OET). Individual Evolution (IE) was not used due to the inherent issue explained at the end of Section II. The actions chosen for the experiments were the “paint” and “iron” actions, as use cases that together include relevant visual, kinesthetic and force features.
The robotic platform used was TEO, a full-sized humanoid robot . For demonstrations of the “paint” action, a paintbrush was attached to the left end-effector of the robot, and the 6 degrees of freedom of the left arm in gravity compensation mode were used. An ASUS Xtion PRO LIVE RGB-D was used to extract the percentage of painted wall. For the “iron” action demonstrations, an iron was installed as the right end-effector using custom 3D printed parts, and the 6 degrees of freedom of the right arm in gravity compensation mode were used. The CUI absolute encoders present in each of the joints of the robot were used to obtain the Cartesian position of the end-effector via forward kinematics. Finally, a JR3 force/torque sensor equipped in the right wrist of the robot was used to measure force features in the “iron” demonstrations. For all of the execution strategies, 3 of the 6 degrees of freedom of the right arm of the robot were used for the evolution, keeping all the other joints (including torso, legs and head) static.
was used as the C++ framework for evolutionary computation. YARP was used for internal and robot component communications. OpenRAVE  was used for the simulation environment. The experimental datasets and presented CGDA strategies have been open-sourced111https://github.com/roboticslab-uc3m/xgnitive.
Steady State Tournament (SST) has been the standard evolutionary algorithm used in CGDA implementations, and has also been used in the experiments in this paper. The presented strategies are situated a layer above evolutionary algorithms such as SST, which can be considered a back-end. Their comparison should not be affected by the selection of a specific set of back-end shared parameters. Parameters have been set to achieve reasonable execution times on a single core of a single machine.
Following this assumption, the SST parameters for all the strategies were set to a population of 10 individuals, a tournament size of 3 individuals, and an individual mutation probability of 60%. The search space of each individual was bounded between -15 and 100, which corresponds to the individual robot arm joint limits expressed in degrees.
FTE termination conditions were to reach a zero fitness value, maximum , or maximum without improvement in fitness . For IET, and are scaled by due to the outer loop, resulting in and . Finally, for OET, and are scaled by due to the outer loop, resulting in and .
The following metrics were used within the development of the experiments:
Evaluations: The total number of passes through mental recognition.
Discrepancy: The final achieved fitness .
Real Iteration Time (): Time between two contiguous motor executions, as defined in (6).
The “paint” action is a representative use case presented in previous work of the authors . While in previous work the generalized “paint” action was generated synthetically as a linear growth from 0% to 100% of the painted portion of a tracked object (a wall), this feature trajectory was now generated from 4 user demonstrations.
Each of the demonstrations was deliberatively performed following a different geometrical trajectory, as depicted in Figure 2
. The figure also depicts the geometrical model generated using Gaussian Mixture Models and Gaussian Mixture Regression as in. The method achieves painting 43.75% of the surface, as the mixture of different geometrical trajectories results in a trajectory similar to their average, which may be or not relevant for performing the action.
|Evaluations||Discrepancy ()||Real Iteration Time () [s]||Painted Wall [%]|
The average demonstration time of the “paint” action was . Selecting a low would result in an intractable value of for FTE, due to the size of its search space. was set, resulting in intermediate goals for comparison of the strategies.
The results obtained from the CGDA execution strategy experiments for “paint” are shown in Table I
, where averages and standard deviations were extracted from 3 repetitions of each experiment. Figure3 shows a comparison of the achievement of intermediate goals with each of the strategies, compared to the reference generalized action obtained from the user demonstrations.
Similar to previous experimental evidence , FTE was the strategy that took most evaluations to converge, as a result of the size of the search space. Discrepancy was not the highest, despite the apparent lack of correlation with respect to the generalized action in Figure 3. This is due to the Dynamic Time Warping metric used in mental recognition. FTE is also the slowest strategy in terms of , accounting for all the evaluations before motor execution.
IET requires less evaluations and than FTE, as a result of the reduced search space. However, IET has a larger discrepancy and achieves a lower percentage of painted wall than FTE. This is because IET may suffer the effects of non-optimal decisions for initial intermediate goals.
OET results in more evaluations than IET in Table I, as OET may perform diffferent motor executions until it achieves an intermediate goal. Figure 3 is a compact representation that depicts the percentage of painted wall after achieving each intermediate goal. OET obtained the best result in terms of , with an average of 4 seconds between real motor executions. Its final achieved percentage of painted wall is also the highest, and it additionally minimizes discrepancy.
|Evaluations||Discrepancy ()||Real Iteration Time () [s]|
The generalized action for the ‘iron’ action was generated from 4 demonstrations, depicted in Figure 4. The relevant features in this action were the end-effector Cartesian positions and the force exerted by the iron measured on its vertical axis. The objective was to descend on the ironing board, apply 30 N force, and then ascend again.
The figure also depicts the pure geometrical model generated using Gaussian Mixture Models and Gaussian Mixture Regression as in . In this case, while geometrically accurate, the measured force was close to zero.
The average demonstration time of the “iron” action was . was set, resulting in intermediate goals for comparison of the strategies. The results obtained from the experiments are shown in Table II, extracted from 3 repetitions of each experiment.
For the “iron” action, FTE took the maximum amount of evaluations possible, composed by the initialization of the 10 individuals, and reaching with this population. FTE discrepancy and were also the highest for this action, while intermediate results were obtained with IET.
OET obtained the best overall results for the “iron” action. The average 1.4 second mark is similar to the times of human mental simulations as measured in .
A change of paradigm in evolutionary strategies for CGDA execution, from offline to online evolutionary strategies, is presented in this paper. Previously developed algorithms for CGDA execution subscribed to a model where planning was performed in mental simulations, and the final computed trajectory was sent to the robot for motor execution. Online evolutionary strategies reduce the time dedicated to mental processes between motor executions by shifting motor execution into the planning loop.
A concrete implementation of an online evolutionary strategy, Online Evolved Trajectories (OET), has additionally been introduced. OET is an online evolutionary strategy for CGDA in real world applications, including dynamic environments or human intervention/collaboration. It enables human interventions similar to the pure geometric approach of , enhanced by complementary features such as vision and force. These features are recorded simultaneously and agnostically. This is an improvement to previous literature on robot imitation of an “iron” action , where geometrical trajectories are learned first, and then forces are demonstrated using a separate haptic device during the execution of the previously learned geometrical trajectory. The results obtained show a notable improvement over the previous offline strategies used by the authors, experiencing large improvements not only in terms of elapsed time between motor executions, but also in terms of overall fitness of the Continuous Goal-Directed Action.
OET has opened a new range of possible real world applications to the CGDA framework. The implementation of real world actions, where the environment experiences external changes, or collaborative tasks where the user helps the robot to perform the action, is now feasible within the CGDA framework.
Future lines of research include reducing , for instance through the use of parallelism. As is minimized, adaptive rates of can additionally be incorporated.
The research leading to these results has received funding from the RoboCity2030-III-CM project (Robótica aplicada a la mejora de la calidad de vida de los ciudadanos. Fase III; S2013/MIT-2748), funded by Programas de Actividades I+D en la Comunidad de Madrid and cofunded by Structural Funds of the EU.
-  S. Calinon, F. Guenter, and A. Billard, “On Learning, Representing, and Generalizing a Task in a Humanoid Robot,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 37, no. 2, pp. 286–298, Apr. 2007.
-  A. J. Ijspeert, J. Nakanishi, H. Hoffmann, P. Pastor, and S. Schaal, “Dynamical movement primitives: learning attractor models for motor behaviors,” Neural Computation, vol. 25, no. 2, pp. 328–373, Feb. 2013.
L. Rozo, S. Calinon, D. G. Caldwell, P. Jimenez, and C. Torras, “Learning
collaborative impedance-based robot behaviors,” in
AAAI Conference on Artificial Intelligence, Bellevue, WA, USA, 2013, pp. 1422–1428.
-  D. Koert, G. Maeda, R. Lioutikov, G. Neumann, and J. Peters, “Demonstration based trajectory optimization for generalizable robot motions,” in Proceedings of the International Conference on Humanoid Robots (HUMANOIDS), 2016.
S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep
Journal of Machine Learning Research, vol. 17, no. 39, pp. 1–40, 2016.
-  S. Morante, J. G. Victores, A. Jardón, and C. Balaguer, “Action effect generalization, recognition and execution through continuous goal-directed actions,” in 2014 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2014, pp. 1822–1827.
-  S. Morante, J. G. Victores, and C. Balaguer, “Automatic demonstration and feature selection for robot learning,” in 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids). IEEE, Nov. 2015, pp. 428–433.
-  S. Morante, J. G. Victores, A. Jardón, and C. Balaguer, “Humanoid robot imitation through continuous goal-directed actions: an evolutionary approach,” Advanced Robotics, vol. 29, no. 5, pp. 303–314, 2015.
-  M. Müller, Dynamic Time Warping. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007, pp. 69–84.
-  S. Martínez, C. A. Monje, A. Jardón, P. Pierro, C. Balaguer, and D. Munoz, “Teo: Full-size humanoid robot design powered by a fuel cell system,” Cybernetics and Systems, vol. 43, no. 3, pp. 163–180, 2012.
S. Picek, M. Golub, and D. Jakobovic, “Evaluation of crossover operator performance in genetic algorithms with binary representation,” inInternational Conference on Intelligent Computing. Springer, 2011, pp. 223–230.
-  P. Fitzpatrick, G. Metta, and L. Natale, “Towards long-lived robot genes,” Robotics and Autonomous systems, vol. 56, no. 1, pp. 29–45, 2008.
-  R. Diankov, “Automated construction of robotic manipulation programs,” Ph.D. dissertation, Carnegie Mellon University, Robotics Institute, August 2010.
-  J. B. Hamrick, K. A. Smith, T. L. Griffiths, and E. Vul, “Think again? The amount of mental simulation tracks uncertainty in the outcome,” 37th Annual Conference of the Cognitive Science Society, vol. 1, 2015.
-  D. Vogt, S. Stepputtis, R. Weinhold, B. Jung, and H. B. Amor, “Learning Human-Robot Interactions from Human-Human Demonstrations ( with Applications in Lego Rocket Assembly ),” 2016 IEEE-RAS International Conference on Humanoid Robots (Humanoids 2016), pp. 142–143, 2016.
P. Kormushev, S. Calinon, and D. G. Caldwell, “Imitation learning of positional and force skills demonstrated via kinesthetic teaching and haptic input,”Advanced Robotics, vol. 25, no. 5, pp. 581–603, 2011.