Service robots need to perform a variety of tasks in different domains, which requires flexible and fast motion planners. Stable dynamical systems (DS) arise as a promising approach for real-time motion generation and gained increasing attention in the community [DMP, Pervez18, SEDS, Clf, tau-SEDS, Perrin16, Blocher17, duan2017fast, NeuralLearn2]. DS have several interesting properties that make them well-suited for motion generation. Indeed, DS can plan stable motions from any starting point to any target position in the robot’s workspace. DS can on-line replan the robot’s trajectory to cope with changes in the target position or unexpected obstacles [DS_avoidance, Saveriano13, Saveriano14, Saveriano17]. DS motions can be incrementally updated as novel task demonstrations are provided [Kronander15, Saveriano18] or confined to constrained domains [Saveriano19_Learning] without compromising the convergence to the goal. Finally, DS have been used to encode primitive robotic skills or movement primitives [Schaal99], which can be properly scheduled to execute complex tasks [Caccavale17, Caccavale18]. When the DS is learned from demonstration, we are often interested in accurately reproducing the demonstrated trajectories. This is a key difference between DS-based and learning-based motion planners (like [pfeiffer2017from]), where the objective is to learn and generalize collision-free, goal-reaching paths given (imperfect) demonstrations.
|Global stability||Multiple motions||Any regression technique||Autonomous DS||Single-step learning||Training time|
Dynamic movement primitives (DMPs) [DMP] are certainly the most used approach for DS-based motion generation. A DMP is the superposition of a linear system and a non-linear term learned from a single demonstration. A time-dependent clock signal suppresses the non-linear term to enforce the convergence towards a given target. The DMP framework has been extended in several ways. For instance [Pervez18, TP-DMP] introduce task-dependent parameters to customize the robot motion, while [Saveriano19_Merging] investigates the possibility to merge multiple DMPs to plan more complex pose trajectories. A problem of DMPs is that the clock signal may introduce deviations from the demonstration. Although time rescaling techniques have been proposed [DMP], DMPs still miss generalization capabilities outside the demonstration area [tau-SEDS].
An alternative approach is to encode the motion into a state-dependent DS. This idea is exploited by the stable estimator of dynamical systems (SEDS) in [SEDS], where the parameters of a Gaussian mixture regression (GMR) are constrained to ensure global stability. However, SEDS exploits quadratic stability constraints that limit the reproduction accuracy. Researchers in the field have realized that, in some cases, accuracy and stability are conflicting objectives to achieve. This is known as the accuracy vs stability dilemma [tau-SEDS] and several approaches have been proposed to improve the accuracy while preserving the stability.
The work in [Blocher17, duan2017fast] focus on a single regression technique and try to maximize the accuracy while keeping the stability. Similarly, [Perrin16] proposes to transform a linear DS using a learned diffeomorphic mapping111A diffeomorphism is a bijective, continuous, and continuously differentiable mapping with a continuous and continuously differentiable inverse. that accurately represents a single demonstration. These approaches have the advantage of handling the accuracy vs stability dilemma within a single-step learning procedure. However, it is clear that focusing on a single regression technique limits the applicability of the approach because it is not known a prior which regression technique works better for a given application.
Alternative approaches apply a divide et impera procedure and separate accuracy and stability into two different (two-steps) learning processes. The work in [Clf] learns a flexible Lyapunov function which reduces the contradictions between stability constraints and training data. The parameters of this Lyapunov function are determined by solving a constrained optimization problem. The approach in [tau-SEDS] assumes that there exists a diffeomorphed space where the training data are accurately represented by a quadratic Lyapunov function. The authors propose to learn a diffeomorphic mapping from training data, project the data into the diffeomorphed space, and then use an approach like SEDS to fit a stable DS. Although the approaches in [Clf, tau-SEDS] result in accurate motions, the two-step learning requires to fit both a Lyapunov function (or a diffeomorphism) and a dynamical system from training data. Hence, these approaches introduce further tunable parameters and increase the overall training time. The main features of the presented approaches are listed in Tab. I.
In this paper, we propose the Energy-based Stabilizer of Dynamical Systems (ESDS), a single-step learning framework that copes with the stability vs accuracy dilemma. ESDS exploits a particular form of DS consisting of a superposition of a conservative vector field (a linear dynamics) and a non-linear, in general dissipative one. The dissipative vector field is learned from demonstrations and retrieved at run-time using any regression technique. Therefore, ESDS is not targeted to a single regression technique like[Perrin16, Blocher17, duan2017fast]. To stabilize the learned dynamics, ESDS proceeds with energy considerations in a way that resembles the energy tank-based controllers [Franken11, Kronander16, Selvaggio19], but with the objective to ensure the stability of the DS rather than its passivity. To this end, we augment a quadratic Lyapunov function with an additive term that plays the role of a virtual energy tank. With this formulation, the non-linear field is followed until there is energy in the tank and smoothly vanishes if the energy is depleted. Being the initial value of the energy of importance for an accurate reproduction, we propose an approach to determine the initial energy from training data. An experimental comparison shows that ESDS reaches similar or better accuracy than two-step approaches in a time that is comparable with single-step ones, which makes ESDS a promising approach for motion generation with stable DS.
Ii Proposed Approach
Ii-a Problem definition and background material
We represent a robotic skill as a mapping between the robot position (in joint or Cartesian space) and its velocity . Therefore, the skill can be represented as a first-order dynamical system (DS) in the form
where the time dependencies have been omitted to ease the notation. In general, the mapping occurs through the continuous and continuously differentiable non-linear function . A solution of (1) represents a trajectory. Different initial conditions generate different trajectories. An equilibrium point is a point where the velocity vanishes, i.e. . is locally asymptotically stable (LAS) if . If , is globally asymptotically stable (GAS) and it is the only equilibrium of the DS.
A sufficient condition for to be GAS is that there exists a scalar, continuously differentiable function of the state variables satisfying [Slotine91]:
Our goal is to represent a set of training data (position/velocity pairs) as a first-order DS that satisfies the following requirements:
The point is a GAS—and therefore unique—equilibrium of the DS.
The trajectories of the DS “accurately” represent the training data.
An approach that satisfies 1) and 2) is presented as follows. In the rest of the paper, we consider without loss of generality. A different goal can be reached by applying the constant state translation .
Ii-B Dynamical system definition
In this work, we use a DS different from (1). More specifically, we consider the non-linear function as the superposition of two vector fields: i) a stable, linear vector field and ii) a non-linear, in general non-conservative, vector field . The DS used in this work is
where is a smooth and positive semidefinite function that vanishes at the equilibrium (). The function is used to ensure that the DS in (3) has an equilibrium point at . We use in this work.
At this point, we have no guarantee that is GAS, since the DS can fall in a spurious equilibrium, follow a periodic orbit, or even diverge. The stability of (3) is analyzed in Sec. II-D, while the rest of this section explains how to encode a set of demonstrations in a DS defined as in (3). Consider that a set of demonstrations is given as , where is the desired robot position (in joint or Cartesian space) at time and demonstration , is the desired robot velocity, is the number of samples in each demonstration, and the number of demonstrations. The demonstrations are firstly converted into input/output pairs by rewriting the DS in (3) as
Equation (4) shows that the unknown term is a non-linear mapping between and . Therefore, we use the demonstrated states as input and as observations of
in a supervised learning process. In other words, we add a non-linear, state-dependent displacementto the linear and convergent dynamics to closely reproduce the demonstrated velocities. Given the input/output pairs /, any regression technique can be used to learn and retrieve a smooth velocity for each state. Note that the definition of in (5) does not generate discontinuities because for .
Ii-C Energy tank-based Lyapunov function
As mentioned in Sec. II-B, the stability of (3) is not guaranteed yet. To this end, we aim at exploiting the Lyapunov stability results summarized in Sec. II-A. The easiest way to define a function that satisfies conditions (2a) and (2c), i.e. a candidate Lyapunov function, is to consider the quadratic potential . In order to fulfill also condition (2b), one has to show that the time derivative , i.e. one has to analyze the sign of
where the definition of from (3) has been used. By inspecting (6), it is straightforward to verify that , and that the term is always negative. The problem is that the sign of cannot be determined in advance even if we know that . Therefore, the stability condition (2b) is not automatically verified. Several approaches have been proposed to ensure that (see, for example, [SEDS, tau-SEDS]). However, it is known that the stability conditions imposed by a quadratic potential may prevent accurate motion learning [Blocher17, Clf].
Instead of imposing quadratic stability constraints on the DS, we propose an alternative solution inspired by energy tank-based controllers [Franken11, Kronander16, Selvaggio19]. The idea is to associate to the DS a virtual energy tank and reuse the energy dissipated when . Stable motions generated by increase the level of the tank while possibly unstable motions () reduce the level of the tank. In this formulation, possibly unstable motions are executed unless the storage is depleted, preserving the overall stability of the system. More formally, we consider the augmented Lyapunov candidate
where the function plays the role of an energy tank and keeps track of the energy dissipated by the DS in previous instants. Recall that a candidate Lyapunov function satisfies conditions (2a) and (2c), i.e. it is positive definite and radially unbounded. Assuming that is positive semi-definite and zero at least for , condition (2a) is satisfied. Notice that, even if for some , . Condition (2c) can be fulfilled by requiring that for or, as in this work, by assuming that is bounded.
In principle, it is possible to assign to an arbitrary dynamics that results in a positive semi-definite and bounded that vanishes at . Let us first define the dynamics
We can now define the dynamics of as
where , is defined in Sec. II-B, and is the maximum value of . Indeed, the storage is initialized as , with . Note that the first condition in (9) guarantees that both and in (7) vanish at the equilibrium point. The gain satisfies
where is strictly smaller than to ensure global stability (see Sec. II-D). The scalar gain satisfies
The parameter is the minimum value of and we use in this work. The choice of and is made to ensure that . In this way, the function is positive definite. Moreover, the dynamics of in (9) guarantees that if . Recalling that , this allows us to conclude that . The function is also radially unbounded since is bounded and is unbounded.
The non-linear term of the DS (3) can be followed until there is energy in the storage. Indeed, one can extract energy from the storage () if and only if . On the contrary, when the storage is depleted (), the non-linear term may compromise the stability and it has to be suppressed. To this end, we modify the DS in (3) as
where the scalar gain satisfies
By inspecting (12) and (13), it is clear that suppresses the term when the storage is depleted. In this case, the DS is driven towards the equilibrium by the linear—and therefore stable—term . The functions , , and are defined in Tab. II. These piecewise-defined functions are chosen to guarantee smooth switching between different conditions. It is worth noticing that the formulation in (10)–(13) resembles that in [Kronander16]. In [Kronander16], torque commands for the robot are generated by feeding the robot state in a given DS. The torque command has a non-passive term that is suppressed by a function similar to . Here, we apply a similar approach with the goal of learning an accurate and converging open-loop representation of a demonstrated motion. As a final remark, introduces a time dependency in the dynamical system (12). This DS formulation (3) is then similar to DMPs [DMP], in the sense that DMPs also use a time-dependent (clock) signal to retrieve asymptotic stability. However, in the DMP formulation, the non-linear term only depends on time and is learned from a single demonstration, while in ESDS the term is state dependent and can be learned from multiple demonstrations. Moreover, the clock signal in the DMP exponentially vanishes with time, while the storage can be potentially charged during the motion, helping to accurately reproduce the demonstrations.
Ii-D Stability analysis
First, we show that is a proper the candidate Lyapunov function, i.e. it satisfies (2a) and (2c). is radially unbounded (condition (2c)) because is bounded. Moreover, since the dynamics of the storage function is defined as in (9), it holds that . Since and satisfy conditions (10) and (11) respectively, and being , it also holds that . Consequently, and only at (condition (2a)).
Second, we prove that condition (2b) holds. By taking the time derivative of we have that . The function vanishes at the equilibrium because . Therefore, we have to prove that . Considering the definition of in (12) and in (9), we have to consider two cases.
Case I: In this case , hence
Now, being from (10) and , the term in (14) is negative and vanishes only at . The second term also vanishes at because . Moreover, being for and for , it holds that . This allows as to conclude that and for (condition (2b)).
Case II: The dynamics of is , hence
The terms and in (15) are negative, while the sign of depends on the sign of . For , it holds that . For , it holds from (13) that . Being , it holds from (10) that . From (8), implies that . Given these equalities and inspecting (9), it is straightforward to verify that , that implies and for .
Theorem 1 still holds if the DS converges towards a different equilibrium . One has to simply apply the constant state translation and define the DS as .
Ii-E Storage function initialization
The initial value of the storage function affects the trajectory retrieved from the DS (12) and, as a consequence, the overall accuracy in reproducing the demonstrations. As qualitatively shown in Fig. 1, small values of cause the storage to be depleted too quickly introducing deviations from the demonstrated data (Fig. 1(a) and 1(b)). For this reason, we propose an approach to estimate the value of from training data.
To this end, we consider
that is the dynamics in (8) with , , and for and for . Notice that, being , we do not consider the possible charge introduced by . Moreover, being for , we do not consider the possible charge introduced by . Therefore, the storage dynamics in (16) only considers the depleted energy. Now, we compute for all the training data ( and ), and select
where is the sampling time. Each represents an upper bound on the energy depleted by the system (12) to reproduce the -th demonstration and, by selecting and , there will be enough energy to accurately reproduce the demonstrations without affecting the stability. Note that the in (17) does not guarantee that the storage is not depleted for any initial condition but only sufficiently close to the demonstrations. However, even if the storage is depleted before reaching the goal, the DS (12) is still stable (Theorem 1) and generated motion converges to the goal.
Iii Results and Comparisons
The results in this section show the capabilities of ESDS. First, we compare ESDS with prominent approaches in the field using the LASA Handwriting dataset [LASA_dataset] as a benchmark. The dataset contains the stable 2D motions shown in Fig. 2. Then, we present an experiment on a real robot.
Iii-a Accuracy and computational complexity
The aim of this test is to quantitatively compare several approaches for motion generation with stable DS. The comparison is carried out considering two major quantities, namely the reproduction accuracy and the required training time. The accuracy is quantitatively measured using the swept error area () [Clf] and the velocity root mean square error [NeuralLearn2]. The is defined as , where is the DS position and the demonstrated position at time and demonstration , is the number of samples in each demonstration, is the number of demonstrations, and is the area of the tetragon formed by , , , and . Notice that DS trajectories are equidistantly re-sampled to contain exactly points. The is defined as , where is the velocity retrieved from the learned DS. For our approach, is defined as in (12). In other words, the measures the error in reproducing the shape of the demonstrated motions, while the measures how the DS preserves the demonstrated velocities.
Having defined the accuracy metrics, we present comparative results obtained by downsampling each demonstration in the LASA dataset to points and by considering the first three demonstrations for each motion. Qualitative results of ELDS on this dataset are depicted in Fig. 2, while quantitative results are summarized in Tab. III. For ESDS, the initial value of the storage is automatically determined as detailed in Sec. II-E, while other quantities are defined as in Tab. II following the implementation in [Kronander16, Smooth_functions]. In order to be consistent with state-of-the-art approaches [SEDS, Blocher17, tau-SEDS], we used GMR as regression technique in this evaluation also for ESDS and CLF-DM in [Clf]. For all the approaches that uses GMR, the optimal number of Gaussian components is independently found for each approach and each motion by fitting a stable DS for each , computing the for each , and choosing as the number of components that results in the minimum . It is worth reminding that we cannot use GMR for the FDM approach in [Perrin16] since it applies to linear DS, and that ESDS and CLF-DM can work with any regression technique (an example for ESDS is shown in Sec. III-B). For FDM we use kernel functions and run the diffeomorphic matching algorithm for a maximum of iterations. C-GMR parameters are selected as in [Blocher17]. SEDS, CLF-DM, and -SEDS require non-linear optimization loops. For SEDS, we use the likelihood as cost function and run the optimization for a maximum of iterations. CLF-DM parameterizes the Lyapunov function as a weighted sum of asymmetric quadratic functions (WSAQF), whose parameters are learned via numerical optimization. We search for the optimal number of asymmetric components by repeating the training for and choosing as the number of asymmetric components that results in the minimum . Being significantly faster then SEDS, we let the optimization run for a maximum of iterations. -SEDS learns a WSAQF from demonstrations ( iterations), computes a diffeomorphism from the learned WSAQF, and applies SEDS ( iterations) on the diffeomorphed data.
Results obtained with the best set-up ( and, when needed, ) are reported in Tab. III. Looking at the results, it is clear that SEDS introduces strong deformations in the learned motions and it requires a relatively long training time. These results are in accordance with previous studies [Clf, tau-SEDS, Blocher17]. FDM is extremely fast, but it introduces deformations in the learned motion. The reason is that learning from multiple demonstrations is not possible and one has to average across multiple demonstrations to get a unique trajectory. This is also the reason why the training time of FDM is almost constant across different motions. -SEDS has a slightly better accuracy than FDM but is the slowest among the considered approaches. The long training time is expected since -SEDS needs to learn both a diffeomorphism (via CLF-DM) and a stable DS (via SEDS). Notice that the accuracy of -SEDS can be improved by increasing the number of iterations in the SEDS algorithm. However, our implementation takes more than hours to find and for all the considered motions and generate the results. Therefore, we decided to limit the number of iterations to . C-GMR performs well in terms of accuracy and training time. The problem with this approach is that the shape of the motion is significantly deformed outside a region of the state-space that contains the demonstrations. In other words, accuracy depends on a careful choice of the parameters that define this demonstration area. CLF-DM achieves high accuracy and it is the fastest among the approaches that exploit constrained optimization, i.e. SEDS and -SEDS. The proposed ESDS does not introduce an additional training time or significant motion deviations. Therefore, with GMR regression, our approach has a training time comparable to C-GMR and FDM, it has the best value and a close to C-GMR and CLF-DM. These results show that ESDS is a good compromise between accuracy and training time.
|Learning||[mm2]||[mm/s]||Train. Time [s]|
|Method||(mean / range)||(mean / range)||(mean / range)|
|ESDS (Ours)||431.5 / [26.0-1307]||15.8 / [6.2-31.6]||0.08 / [0.03-0.17]|
|SEDS [SEDS]||1022 / [34.9-2300]||48.4 / [11.4-136.8]||16.3 / [1.2-55.1]|
|CLF-DM [Clf]||460.7 / [16.6-1269]||14.1 / [5.6-23.8]||2.3 / [0.09-21.5]|
|-SEDS [tau-SEDS]||537.0 / [26.4-1139]||27.1 / [9.5-53.8]||25.3 / [7.6-55.4]|
|C-GMR [Blocher17]||496.7 / [20.3-1840]||14.0 / [6.2-24.1]||0.1 / [0.03-0.28]|
|FDM [Perrin16]||550.5 / [42.0-1769]||31.4 / [9.7-70.2]||0.08 / [0.07-0.09]|
|*The source code of SEDS and CLF-DM is available on-line [SEDS_Code, CLF_Code].|
|**The author would like to thank N. Perrin and P. Schlehuber-Caissier for providing|
|the source code of the FDM approach in [Perrin16].|
Iii-B Robot experiment
The goal of this experiment is two-fold. First, we aim at showing that ESDS works with different regression techniques. Therefore, Gaussian process regression [Rasmussen06] is used to retrieve . Second, we show that a DS can encode different motions in different regions of the state-space. To this end, we provide demonstrations of a task consisting in entering into a box from different initial positions (Fig. 3). Demonstrations are provided by kinesthetically guiding the robot towards the task execution. In our setup, the robot’s interface applies a low-pass filter with cut-off frequency of Hz to the measured Cartesian positions (grey lines in Fig. 3(a)). Depending on the starting point, the robot has to surround the box to prevent a collision (brown lines in Fig. 3(a)) or it can follow an almost linear path (green lines in Fig. 3(a)). Both the behaviors are successfully encoded into the same DS, and ESDS is used to retrieve a stable motion converging to a unique target. As shown in Fig. 3(a), the robot is able to generalize the learned skill starting from initial positions that are not in the given training set. Snapshots of the surrounding motion are shown in Fig. 3(b). In this experiment, ESDS generates a motion trajectory by integrating the robot initial position and the robot tracks this reference trajectory using the build-in impedance controller. In principle, it is possible to combine ESDS and the approach in [Kronander16] to feed the measured robot state during the execution. However, the combination may potentially affect the overall accuracy. Further investigations on this point are beyond the scope of this paper and are left as a future work.
Iv Conclusions and Future Work
We presented the Energy-based Stabilizer of Dynamical Systems (ESDS), a novel approach for learning stable motions from human demonstrations. Inspired by energy tank-based controllers, we design a suitable Lyapunov function and stabilize the learned DS at run-time according to the virtual energy present into the system. The initial value of the energy, a parameter that significantly affects the reproduction accuracy, is also automatically estimated from training data. Comparisons on a public benchmark show that ESDS achieves high accuracy with reduced training time, while experiments on a real robot show that ESDS can stabilize a DS retrieved with different regression techniques.
As a future work, we plan to provide more comprehensive results with experiments and comparisons in more challenging scenarios. Moreover, we plan to exploit ESDS in an incremental learning scenario where novel demonstrations of robotic skills are continuously provided at run-time.