Local Online Motor Babbling: Learning Motor Abundance of A Musculoskeletal Robot Arm

Motor babbling and goal babbling has been used for sensorimotor learning of highly redundant systems in soft robotics. Recent works in goal babbling has demonstrated successful learning of inverse kinematics (IK) on such systems, and suggests that babbling in the goal space better resolves motor redundancy by learning as few sensorimotor mapping as possible. However, for musculoskeletal robot systems, motor redundancy can be of useful information to explain muscle activation patterns, thus the term motor abundance. In this work, we introduce some simple heuristics to empirically define the unknown goal space, and learn the inverse kinematics of a 10 DoF musculoskeletal robot arm using directed goal babbling. We then further propose local online motor babbling using Covariance Matrix Adaptation Evolution Strategy (CMA-ES), which bootstraps on the collected samples in goal babbling for initialization, such that motor abundance can be queried for any static goal within the defined goal space. The result shows that our motor babbling approach can efficiently explore motor abundance, and gives useful insights in terms of muscle stiffness and synergy.



There are no comments yet.


page 1

page 7


Active Learning of Inverse Models with Intrinsically Motivated Goal Exploration in Robots

We introduce the Self-Adaptive Goal Generation - Robust Intelligent Adap...

Design Paradigms Based on Spring Agonists for Underactuated Robot Hands: Concepts and Application

In this paper, we focus on a rarely used paradigm in the design of under...

Neurally Plausible Model of Robot Reaching Inspired by Infant Motor Babbling

In this paper we present a neurally plausible model of robot reaching in...

Socially Guided Intrinsic Motivation for Robot Learning of Motor Skills

This paper presents a technical approach to robot learning of motor skil...

Analysis of the displacement of terrestrial mobile robots in corridors using paraconsistent annotated evidential logic eτ

This article proposes an algorithm for a servo motor that controls the m...

Active exploration for body model learning through self-touch on a humanoid robot with artificial skin

The mechanisms of infant development are far from understood. Learning a...

A new model for Cerebellar computation

The standard state space model is widely believed to account for the cer...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The human body is an over-actuated system, not only does it have a higher dimension in motor space than the degree of freedoms in the action space, i.e., more number of muscles than joints, it also has more degree of freedoms (DoFs) than necessary to achieve a certain motor task. How the effector redundant system adaptively coordinates movements remains a challenging problem. In the field of robot learning, when assuming rigid body links with pure rotation and translation

[1], model learning is commonly used to learn the forward or inverse models of kinematics and dynamics for accurate yet agile control[2]. However for bio-mechanical and soft robots such as the elephant trunks[3], or musculuskeletal systems[4][5], where models based on rigid body links are no longer available, learning becomes difficult due to the highly redundant and non-stationary nature of such systems.

Fig. 1: 10 DoF musculoskeletal robot arm actuated by 24 pneumatic artificial muscles (PAMs), with an empirically defined goal space in reference to the red marker, visualized in rviz.

This paper investigates the reaching skills and motor variability of the reached points on a musculuskeletal robot arm[6], an over-actuated system of 24 Pneumatic Artificial Muscles (PAMs) actuating 10 DoFs, as shown in Fig. 1. Traditionally, this problem could be addressed by learning the forward kinematics using motor babbling, and explore the motor-sensory mapping from scratch[7, 8, 9] until eventually the robot can predict the effects of its own actions. However autonomous exploration without prior knowledge in motor babbling doesn’t scale well to high dimensional sensorimotor space, due to the rather inefficient sampling of random motor commands in over-actuated systems. An alternative in [10]

suggests that learning inverse kinematics by goal babbling with active exploration, avoids the curse of dimensionality simply because the goal space is of much smaller dimension than the redundant motor space. Nonetheless,

[10] assumes that the sensorimotor space can be entirely explored, which is not feasible in practice for high dimensional motor systems[3]. Another alternative is then to specify the goal space a priori as a grid, and sampling the goal grid points to guide exploration[11], such that sensorimotor mapping can be sufficiently generalized and bootstrapped for efficient online learning. It has also been quantitatively evaluated for an average of sub-centimeter reaching accuracy on an elephant trunk robot [3] with reasonable experiment time. We therefore implement and further extend on directed goal babbling in [3]. Since the goal space of the robot arm is unknown and non-convex[6]

, we empirically estimate the goal space with randomly generated postures, forcing the convex hull such that directed goal babbling can be applied, and subsequently remove the outlier goals in the goal space after learning.

Given the above works aiming to reduce motor redundancy for learning[7, 8, 9, 10, 11, 3], it can be argued that motor redundancy in human musculoskeletal systems is actually the key stone to natural movements with flexibility and adaptability, hence should be termed motor abundance rather than redundancy [12][13]. In robotics motor learning, [14] also suggests that joint redundancy facilitates motor learning, whereas task space variability does not. Thus we build on directed goal babbling [3], and and propose local online motor babbling, in order to explore motor abundance while fixing space variability on the musculoskeletal robot arm. Local online motor babbling uses CMA-ES initialized by local samples generated from directed goal babbling, such that explorations in the motor space is effectively constrained locally to any queried goal within the goal space, and efficiently generated by adapting the covariance.

This paper is organized as follows: in Section II and III directed online goal babbling and CMA-ES are reviewed. Section IV introduces the simple heuristics to define the goal space, implements directed goal babbling on the musculoskeletal robot arm, and evaluates the learning results. Section V proposes, implements, and evaluates local online motor babbling using CMA-ES to query motor abundance, while providing some insights in muscle stiffness and muscle synergy of the musculoskeletal robot system. Section VI concludes the paper and discusses possible future research directions.

Ii Directed Goal Balling

Given the specified convex goal space encapsulating goal points, and denoting all the reachable set of commands in the motor space as , the aim is to learn the inverse kinematics model , that generalizes all points in the goal space to a subset of solutions in the motor space. Starting from the known home position , and home posture , i.e., the inverse mapping , the goal-directed exploration is generated by


where is the inverse mapping given learning parameter , and adds perturbation noise to discover new positions or more efficient motor commands in reaching goals. At every time step, the motor system forwards the perturbed inverse estimate, , and the actual

samples are used for regression, where prototype vectors and local linear mapping

[15] is used as the regression model, and to monitor the progress of exploration in the defined goal space.

The major part of directed goal babbling is to direct the babbling of the end effector at specified goals and target positions. Each trial of goal babbling is directed at one goal randomly chosen from

, and continuous piecewise linear targets are interpolated along the path


where are the target position and final goal of the trial, and being the step size. Target positions are generated until is closer than to , then a new goal is chosen. The purpose of directed goal babbling is to generate smooth movement around the end effector position, such that the locally learned prototype vectors can bootstrap and extend the exploration of the goal space, and allow the integration of the following weighting scheme


and measure direction and kinematic efficiency of the movement, such that inconsistency of a folded manifold, and redundant joint positions can be optimized[11]. The multiplicative weighting factor is then integrated to the gradient descent that fits the currently generated samples by reducing the weighted square error (see Appendix).

To prevent drifting to irrelevant regions and facilitate bootstrapping on the local prototype centers, the system returns to

with probability

instead of following another goal directed movement. Returning to home posture stablizes the exploration in the known area of the sensorimotor space [12], [18], similar to infants returning their arms to a comfortable resting posture between practices:


the system moves from the last posture to the home posture in the same way as in (2) by linearly interpolating the via-points along the path, until

The exploratory noise, or motor perturbation in 1

, is crucial for discovering new postures that would otherwise not be found by the inverse estimate [12], [29]. By exploring the local surrounding of the inverse estimate with i.i.d normal distribution in each motor dimension, and varying these distribution parameters with a normalized Gaussian random walk, the noise is modeled as:


where all entries in the matrix is initialized and varied as follows:

After learning, the average reaching accuracy is evaluated by querying the inverse model for every goal within the defined goal space , and a simple feedback controller to adapt execution failures. Execution failure occurs when the inverse estimate is not possible to execute, i.e., , due to interference, non-stationary bionic robot design and constantly changing overtime. Given the queried goal and the predicted posture , where , the feedback controller would slightly shift the queried goal from to , then forwarding the inverse estimate . Target shifting follows the current observed error , and integrated over time:


Iii Cma-Es

CMA-ES is a method of black box optimization that minimizes the objective function , , where is assumed to be a high dimensional, non-convex, non-separable, and ill-conditioned mapping of the multi-variate state space. The idea of CMA-ES is introducing a multi-variate normal distribution to sample a population, evaluating the population to select the good candidates, and updating the search distribution parameters by adapting the covariance and shifting the mean of the distribution according to the candidates.

Given a start point

and initializing the covariance to identity matrix

, the search points in one population iteration is sampled as follows:


where , being the mean vector, being the step-size, and is the population size. For notation simplicity, the iteration index is henceforth omitted.

The mean vector is updated by using the non-elitistic selection [16]. Let denote the th best solution in the population of , the best points from the sampled population are then selected, such that , and weighted intermediate recombination is applied:


The step size is updated using cumulative step-size adaptation (CSA). The intuition is when the evolution path, i.e., the sum of successive steps, is short, single steps tend to be uncorrelated and cancel each other out, thus the step-size should be decreased. On the contrary, when evolution path is long, single steps points to similar directions and tend to be correlated, therefore increasing the step size. Initializing the evolution path vector , and setting the constants , the step size is updated as:


The essential part of the evolution strategy is the covariance matrix adaptation. It is suggested that the line distribution adapted using rank-one update will increase the likelihood of generating successful steps , because the adaptation follows a natural gradient approximation of the expected fitness of the population .


Iv Inverse Kinematics Learning

We use a 10 DoF musculoskeletal robot arm from [6] for the experiments, where the robot is controlled via ROS messages. The arm is driven by 24 pneumatic muscles, each with pressure actuation range of MPa.

Fig. 2: Empirical goal space (in blue) sampled from 2000 random postures, and the convex goal space (in red), which is used for learning as shown in Fig. 1

As shown in Fig. 1, the hand of the robot is replaced with a tennis ball as the color marker, and tracking of the end effector is performed in reference to the center of the red marker as the origin, using Intel RealSense ZR300. However the tracking introduces an error up to 1cm in depth, i.e., x-axis, and sub-millimeter error in y and z axis. The colored point cloud overlayed in ROS rviz is the specified convex goal space as in Fig. 1. The control accuracy of the robot is tested according to [3]. By repeating random postures for times each, the average Euclidean norm error is computed to be cm as follows:

Iv-a Define the Goal Space

The complete task space of the upper limb robot is unknown and non-convex, however directed goal babbling would require the specified goal space to be convex to efficiently bootstrap and allow the integration of the weighting scheme in (5). Thus we first empirically estimate the goal space by randomly generating random postures for each muscle within MPa, and take the encapsulated convex hull as the empirical goal space . In order to approximate the uniform samples in for efficient online learning and evaluations, a cube grid with spacing encapsulating is defined, where . The sampled convex hull goal grid in Fig. 1 is then made from the intersection of all points in the empirical goal space and the cube grid, i.e., . However, as shown in Fig. 2, is a slanted non-convex irregular ellipsoid, forcing a convex hull in the random posture samples would introduce non-reachable regions in the goal space. This is addressed later with the similar set operation to remove the outlier goals using the learned prototype vector space.

Fig. 3: Decreasing performance error up to 20000 samples, i.e., the average Euclidean norm to the goals evaluated throughout the convex goal space , the feedback controller is applied at 20000 samples, resulting an average error of 3.5 cm. However there are still outlier goals remaining from forcing the convex hull, thus we take the explored prototype sphere space and intersect with , i.e., to remove the outliers. Evaluating on the cut goal space reduces the error to 1.8 cm
Fig. 4: Performance error distribution of the convex goal space used for IK learning in Fig. 2, and that of the cut goal space in Fig. 5

Iv-B Experiment and Results

The experiment is conducted with samples, with target step length , which corresponds to the target velocity of cm/s, allowing the robot to generate smooth local movements. The sampling rate is set to , generating 5 targets and directed micro movements for learning. After every 4000 samples, performance evaluation is carried out online. The learning experiment including online evaluations amount to less than 2 hours real time. As illustrated in Fig. 3, the learning bootstraps quite fast in the first 4000 samples, followed by a slow convergence until 16000 samples. At , the feedback controller is applied, the performance error drops to 3.4cm. However in there are still many outlier goals, which are the non-reachable regions introduced by forcing the convex hull. A similar set intersection operation is applied with the learned prototype spheres and the goal space , where is taken as the encapsulated space of the prototype spheres, and the final goal space is , as shown in Fig. 5, where the number of goals has been reduced from 179 in to 94 in . We then evaluate again these 94 goals with the feedback controller, the performance error reduces further to an average of 1.8 cm in 3. However due to the forced convex hull , local inverse models cannot efficiently regress at the edge of the task space, the error distribution still shows a few errors larger than 3 cm, which can be further reduced later by motor babbling using CMA-ES.

Fig. 5: Prototype spheres (green) encapsulating the final goal space (red), which is used for goal babbling, after outlier goals has been removed.

V Learning Motor Redundancy

CMA-ES explores by expanding the search distribution of the parameters, shifting the mean and expanding covariance, until optimum solution is found within that distribution, followed by shrinking the covariance and shifting the mean to the global optimum. By intentionally setting the initial mean vector slightly away from the opitimum, i.e., the posture that leads to closest end effector position to the goal, CMA-ES would naturally expand the covariance while keep the search within the vicinity of the queried goal, as the objective function is set to minimize the goal reaching error. Essentially CMA-ES is used to effectively generate motor babbling data, which can be achieved by initializing the mean vector with neighboring goal postures of the queried goal

, and the step-size with the empirical variance estimate of the local samples around

gathered from the goal babbling process.

V-a Local Online Motor Babbling

When learning inverse kinematics using online goal babbling, since there are multiple postures reaching , it is assumed that we don’t need to know all redundancy, and only learn the ones with most direction and kinematic efficiency by integrating the weighting scheme (5) in the optimization. In fact, is not only unknown, and may never be exhaustively explored on a physical system, but also non-stationary due to the nature of musculoskeletal robot design with PAMs. This can be addressed by using the simple feedback controller in (8), where execution failures due to the changing of are adapted when the queried goal is slightly shifted based on the proportion of the euclidean error .

input : , ,
output : 
initialize : 
select N closest goals to ;
for  to  do
       for  to  do
             if  then
                  collect (, ) In ;
             end if
       end for
      select for the minimum in ;
       initialize ;
       while  do
             sample posture population as in (9);
             for  to  do
                   if  then
                         collect in ;
                   end if
             end for
            update as in (10);
             update and as in (11), (12);
             update and as in (13), (14);
       end while
end for
Algorithm 1 Motor Babbling Using CMA-ES

As illustrated in Algorithm 1, the queried goal , the learned inverse model , and the neighboring postures , which is collected from the goal babbling process, are the input to online motor babbling. The aim of the algorithm is to output a new posture configuration set , from which different muscle stiffness can be generated while keeping the end effector position fixed. The initialization sets the gain and number of iteration of the feedback controller to , , t number of trials for CMA-ES , and the prototype sphere radius is . We use pycma library[17] to implement CMA-ES, where we encode variables in the objective function implicitly [16]. The objective function is simply set as the euclidean norm to the goal scaled with a constant, i.e., , where , and the optimum objective function value is set to , meaning that an empirical optimum of to the goal, which is also the stopping criteria for each CMA-ES trial.

Each trial of CMA-ES starts by iterating the feedback controller and finding the posture that leads closest to the neighboring goal, and is subsequently used to initialize the mean vector . The covariance is initialized to be an identity matrix, which allows isotropic search and avoids bias. In order to initialize the step-size, an empirical variance is estimated from , and the mean of the variance is taken as initialization. The union of the two sets is to ensure sufficient data for a feasible estimation. Near the home position, which is the centroid of the goal space, many data samples are available as online goal babbling often comes back to . However around the edges of the goal space, there are often very few local samples, sometimes less than the action space dimension, i.e., the 24 muscles. By taking in the samples generated by feedback controller, a better initialization of can be robustly estimated.

V-B Visualizing Muscle Abundance

In order to visualize muscle abundance, namely in terms of reproducing muscle stiffness and muscle synergy encoded in the evolved covariance matrix, we assume the distribution of parameters to be multi-variate Gaussian and multi-modal, as the motor space is of high dimension, and there can be different muscle group posture configurations while keeping the end effector fixed. Therefore a multi-variate Gaussian Mixture Model

[18] is fit to the collected data in . By assuming a distribution of Gaussian parameters over the data samples

, a prior multi-variate Gaussian distribution is introduced

are the weights for each Gaussian mixture component, and the posterior distribution is estimated by using Bayes rule [18], such that the posterior distribution would preserve the form Gaussian mixture model, i.e.,

where the parameters and weights

are updated using Expectation Maximization (EM) to maximize the likelihood

[18]. The number of mixture models is estimated using Bayesian Information Criterion (BIC) [18] for , where the lowest BIC of is taken. Finally, we sample from the mixture model with updated parameters and weights and forward on the robot.

V-C Experiment and Results

Fig. 6: 10 selected goals for motor babbling in the final goal space , the color point clouds are the local samples within 2cm radius of the queried goals, which is used to initialize the step-size for the CMA-ES trials

We evenly selected 10 goals in the final goal space to perform online motor babbling. The selected goals and their local samples within 2cm radius are shown in Figure 6. The goals are selected to show case the generality of querying any goal within the goal space for motor babbling. Around the edges, goal 26, 5, 89, 52 are chosen, and near the centroid home position, goal 44 and 39 are selected. The rest goals 17, 43, 55, and 60 are to populate the rest of the goal space. It can be expected and observed that more samples were generated near the home posture, since in online goal babbling the arm returns to with probability , whereas goals around the edges have only a few samples, such as goal 26 and 89.

Fig. 7:

Comparing the reaching error and muscle variability of directed goal babbling and local motor babbling using CMA-ES, where CMA-ES not only increased the means and standard deviations of all 24 muscle variances for the 10 queried goals, but the reaching error has also been reduced

For each selected goal, trials of CMA-ES is performed as in Algorithm 1, where each trial takes on average 5 minutes experiment time on the robot. Muscle stiffness is then reproduced by first fitting the collected neighboring samples to the Gaussian mixture model, which serves as a baseline learned during goal babbling, followed by another experiment fitting to the mixture model and the subsequent sampling. 200 samples from the mixture model is evaluated on the robot, the mean and standard deviation of the reaching error, and of pressure variance are plotted. As illustrated in Fig. 7, CMA-ES outperforms the baseline in terms of both larger muscle pressure variance and smaller goal reaching error, where the average lies close to the 2cm prototype sphere radius. Since online goal babbling favors kinematic and direction efficiency by reducing motor redundancy, the sampled muscle pressure generally varies trivially compared to the ones generated from the CMA-ES GMM model, which expands the variance in search of global optimum while keeping the goal reaching accuracy. Due to non-stationary changes of the possible posture configurations , the local neighboring samples no longer lead to a close position to the goal, however of the neighboring goals can be used for initializing the step-size , and initializing the mean vector from , to adapt to non-stationary changes and maintain the goal reaching accuracy while performing motor babbling.

Fig. 8: One evolution trial for goal 44, the search of the step-size increases until the defined optimum objective function value is found

It can be observed that for goal 44, which is closest to the home position, the motor variance doesn’t increase much as other queried goals. This is because every time the interpolated directed goal path comes across the centroid home region, goal 44 has a higher chance of collecting more samples of varied motor configurations within the neighborhood. Nevertheless, CMA-ES still explores motor redundancy rather efficiently. As shown in Fig. 8, the evolution trial expands the maximum and minimum standard deviation of the search, i.e., such that the optimum is reached. After 5 such evolution trials, the sampled GMM data is used to estimate the covariance, compared with the covariance estimate from the baseline GMM data. As shown in 9, CMA-ES preserves the structure while enhancing the variance on the diagonal, while also discovers more correlation within different groups of muscles, which can be prominently observed on the robot in Fig. 10.

Fig. 9: Comparing baseline and CMA-ES covariances, where the largest change of variance occurs at muscle pair (8,20), changing from 0.003 to -0.01, where the -0.01 covariance corresponds to the standard deviation of 0.1 MPa pressure change, consituting 25% of the PAM actuation range

V-D Interpreting Muscle Abundance

# Muscle Name Function
3 serratus anterior pulls scapula forward
7 latissimus dorsi rotates scapula downward
8 rear deltoid abducts, flexes,
10 front deltoid and extends
11 medial deltoid the shoulder
15 biceps brachii flexes and supinates the forearm
18 brachialis flexes the elbow
19 pronator pronates the hand
20 supinator supinates the hand
TABLE I: Muscle Names and Functions

The muscle pressure variability in the covariance encodes muscle abundance, which can be interpreted as muscle stiffness and static muscle synergies. Loosely speaking, muscle synergy is defined as a co-activation pattern of muscles in a certain movement from a single neural command signal[19]. It can be argued that muscle synergy is a way of kinetically constraining the redundant motor control of limited DoFs, or as neural strategies to handle the sensorimotor systems[20].

Fig. 10: Static muscle synergies, the labelled muscles are color coded in green (low), orange (medium), and red (high) to indicate the state of pressure actuation. A relaxed arm posture with a lowered shoulder can be observed in (a) and (c), whereas a stiffened arm with pronating hand and a lifted shoulder and can be observed in (b) and (d), while keep the end effector position fixed.

We claim no sides in the sensorimotor learning of humans, however, by constraining the end effector position of the musculoskeletal robot arm, the static muscle synergies and stiffness can be encoded in the covariance matrix and provide some useful insights. In Fig. 9, muscles of high variances, namely muscle 3, 7, 8, 10, 11, 15, 18, 19, 20 are of particular interest, where muscle 7 and 8, 20 and 8 are highly negatively correlated. Inspired by the human’s upper limb, the PAMs of the robot arm mimics the function of human arm muscles, as illustrated in Table I. By fitting the data in the mixture models and subsequently applying sampling, we can observe the co-activation patterns of the muscles. As shown in Fig.10 and 10, the upper limb first reaches goal 44 with a relaxed arm posture and a lowered adducted shoulder, whereas in Fig.10 and 10 the end effector position is maintained by stiffening the arm, lifting the extended shoulder, and pronating the hand. The negative correlation of muscle 7 and 8 can be interpreted as the coordination of extension and abduction, as well as the flexion and adduction of the shoulder. Muscle 8 and 20 coordinate shoulder abduction with a supinating hand, and by adducting shoulder while pronating the hand.

Vi Conclusions

We have implemented directed goal babbling[11] to learn the inverse kinematics of a 10 DoF musculoskeletal robot arm actuated by 24 PAMs. We defined the goal space by empirically sampling 2000 random postures and forcing a convex hull ready for learning, and post-processed the goal space to removed outlier goals. The result shows an average reaching error of 1.8 cm, where the reaching accuracy achievable by the robot is 1.2 cm. The simple heuristics and approximation of the goal space allows us to use directed goal babbling to learn a larger sensory space compared to a well-defined yet small partial task space, and promote more efficient mapping to the motor space compared to active exploration[10]. Nevertheless, learning with a forced convex goal space where the intrinsic task space is non-convex introduces outlier goals, which the corresponding directed babbling can be misleading. A future research direction of integrating directed goal babbling with active exploration could be of interest, where the goal space grid can be defined large enough to encapsulate the whole task space, and active exploration guided by the k-d tree splitting and progress logging can indicate the learned task space while still keep the bootstrapping flavor of the inverse model.

We further extended directed goal babbling to local online motor babbling using CMA-ES in search of more motor abundance. By initializing the evolution strategy with local samples generated from goal babbling, any point within the goal space can be queried for motor abundance. The idea is to intentionally initializing the mean vector of CMA-ES slightly away from the queried goal. By expanding covariance and setting the stop condition to meeting the set optimum of the objective function value, efficient motor babbling data can be generated locally around the queried goal with a few CMA-ES trials of different initializations from the neighboring goals. We evenly selected 10 goals within the goal space to showcase the generality of local online motor babbling. The results show that our proposed method has significantly increased the average muscle pressure variances, while keeping the end effector more stable and closer to the queried goals, compared with the goal babbling baseline. Even in the home position where motor abundance has already been well-explored, local motor babbling shows a maximum increased standard deviation of 0.1 MPa, constituting 25% of the muscle pressure actuation range. Our method also adapts to queried goals near the edges of the goal space where samples for initialization are sparse due to the uneasy posture of the robot arm around such goals.

By fitting Gaussian mixture models to the data collected using local motor babbling, the sampling of the GMMs can reproduce motor abundance in terms of muscle stiffness and muscle synergies encoded in the evolved single-mode covariance matrix. Muscle stiffness can be seen on the inflating and deflating muscles, and muscle synergies can be clearly observed in the covariance where variances and correlations are strong, as well as when GMM sampled postures are applied on the robot correspondingly. The bonus that comes with the encoded covariance and mixture models is that the queried motor abundance can be captured and reproduced by distributions, which enables the formulation of trials for reinforcement learning in future research, such as learning weight lifting with varied muscle stiffness, planning trajectories and learning dynamics using via-points and the locally queried motor abundance library.


The author would like to thank Hiroaki Masuda for his mechanical maintenance of the upper limb robot, and Dr. Matthias Rolf for his suggestions in implementing directed online goal babbling.


  • [1] R. P. Paul, Robot manipulators: mathematics, programming, and control: the computer control of robot manipulators.   Richard Paul, 1981.
  • [2] D. Nguyen-Tuong and J. Peters, “Model learning for robot control: a survey,” Cognitive processing, vol. 12, no. 4, pp. 319–340, 2011.
  • [3] M. Rolf and J. J. Steil, “Efficient exploratory learning of inverse kinematics on a bionic elephant trunk,”

    IEEE transactions on neural networks and learning systems

    , vol. 25, no. 6, pp. 1147–1160, 2014.
  • [4] K. Hosoda, S. Sekimoto, Y. Nishigori, S. Takamuku, and S. Ikemoto, “Anthropomorphic muscular–skeletal robotic upper limb for understanding embodied intelligence,” Advanced Robotics, vol. 26, no. 7, pp. 729–744, 2012.
  • [5] R. Niiyama, S. Nishikawa, and Y. Kuniyoshi, “Biomechanical approach to open-loop bipedal running with a musculoskeletal athlete robot,” Advanced Robotics, vol. 26, no. 3-4, pp. 383–398, 2012.
  • [6] A. Hitzmann, H. Masuda, S. Ikemoto, and K. Hosoda, “Anthropomorphic musculoskeletal 10 degrees-of-freedom robot arm driven by pneumatic artificial muscles,” Advanced Robotics, vol. 32, no. 15, pp. 865–878, 2018.
  • [7] P. Gaudiano and S. Grossberg, “Vector associative maps: Unsupervised real-time error-based learning and control of movement trajectories,” Neural networks, vol. 4, no. 2, pp. 147–183, 1991.
  • [8] Y. Demiris and A. Dearden, “From motor babbling to hierarchical learning by imitation: a robot developmental pathway,” 2005.
  • [9] A. D’Souza, S. Vijayakumar, and S. Schaal, “Learning inverse kinematics,” in Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No. 01CH37180), vol. 1.   IEEE, 2001, pp. 298–303.
  • [10]

    A. Baranes and P.-Y. Oudeyer, “Active learning of inverse models with intrinsically motivated goal exploration in robots,”

    Robotics and Autonomous Systems, vol. 61, no. 1, pp. 49–73, 2013.
  • [11] M. Rolf, J. J. Steil, and M. Gienger, “Online goal babbling for rapid bootstrapping of inverse models in high dimensions,” in Development and Learning (ICDL), 2011 IEEE International Conference on, vol. 2.   IEEE, 2011, pp. 1–8.
  • [12] M. Latash, “There is no motor redundancy in human movements. there is motor abundance,” 2000.
  • [13] M. L. Latash, “The bliss (not the problem) of motor abundance (not redundancy),” Experimental brain research, vol. 217, no. 1, pp. 1–5, 2012.
  • [14] P. Singh, S. Jana, A. Ghosal, and A. Murthy, “Exploration of joint redundancy but not task space variability facilitates supervised motor learning,” Proceedings of the National Academy of Sciences, vol. 113, no. 50, pp. 14 414–14 419, 2016.
  • [15]

    T. Kohonen, “The self-organizing map,”

    Proceedings of the IEEE, vol. 78, no. 9, pp. 1464–1480, 1990.
  • [16] N. Hansen, “The cma evolution strategy: A tutorial,” arXiv preprint arXiv:1604.00772, 2016.
  • [17] N. Hansen, Y. Akimoto, and P. Baudis, “CMA-ES/pycma on Github,” Zenodo, DOI:10.5281/zenodo.2559634, Feb. 2019. [Online]. Available: https://doi.org/10.5281/zenodo.2559634
  • [18] C. M. Bishop, Pattern recognition and machine learning.   springer, 2006.
  • [19] G. Torres-Oviedo, J. M. Macpherson, and L. H. Ting, “Muscle synergy organization is robust across a variety of postural perturbations,” Journal of neurophysiology, 2006.
  • [20] M. C. Tresch and A. Jarc, “The case for and against muscle synergies,” Current opinion in neurobiology, vol. 19, no. 6, pp. 601–607, 2009.