I Introduction
The human body is an overactuated system, not only does it have a higher dimension in motor space than the degree of freedoms in the action space, i.e., more number of muscles than joints, it also has more degree of freedoms (DoFs) than necessary to achieve a certain motor task. How the effector redundant system adaptively coordinates movements remains a challenging problem. In the field of robot learning, when assuming rigid body links with pure rotation and translation
[1], model learning is commonly used to learn the forward or inverse models of kinematics and dynamics for accurate yet agile control[2]. However for biomechanical and soft robots such as the elephant trunks[3], or musculuskeletal systems[4][5], where models based on rigid body links are no longer available, learning becomes difficult due to the highly redundant and nonstationary nature of such systems.This paper investigates the reaching skills and motor variability of the reached points on a musculuskeletal robot arm[6], an overactuated system of 24 Pneumatic Artificial Muscles (PAMs) actuating 10 DoFs, as shown in Fig. 1. Traditionally, this problem could be addressed by learning the forward kinematics using motor babbling, and explore the motorsensory mapping from scratch[7, 8, 9] until eventually the robot can predict the effects of its own actions. However autonomous exploration without prior knowledge in motor babbling doesn’t scale well to high dimensional sensorimotor space, due to the rather inefficient sampling of random motor commands in overactuated systems. An alternative in [10]
suggests that learning inverse kinematics by goal babbling with active exploration, avoids the curse of dimensionality simply because the goal space is of much smaller dimension than the redundant motor space. Nonetheless,
[10] assumes that the sensorimotor space can be entirely explored, which is not feasible in practice for high dimensional motor systems[3]. Another alternative is then to specify the goal space a priori as a grid, and sampling the goal grid points to guide exploration[11], such that sensorimotor mapping can be sufficiently generalized and bootstrapped for efficient online learning. It has also been quantitatively evaluated for an average of subcentimeter reaching accuracy on an elephant trunk robot [3] with reasonable experiment time. We therefore implement and further extend on directed goal babbling in [3]. Since the goal space of the robot arm is unknown and nonconvex[6], we empirically estimate the goal space with randomly generated postures, forcing the convex hull such that directed goal babbling can be applied, and subsequently remove the outlier goals in the goal space after learning.
Given the above works aiming to reduce motor redundancy for learning[7, 8, 9, 10, 11, 3], it can be argued that motor redundancy in human musculoskeletal systems is actually the key stone to natural movements with flexibility and adaptability, hence should be termed motor abundance rather than redundancy [12][13]. In robotics motor learning, [14] also suggests that joint redundancy facilitates motor learning, whereas task space variability does not. Thus we build on directed goal babbling [3], and and propose local online motor babbling, in order to explore motor abundance while fixing space variability on the musculoskeletal robot arm. Local online motor babbling uses CMAES initialized by local samples generated from directed goal babbling, such that explorations in the motor space is effectively constrained locally to any queried goal within the goal space, and efficiently generated by adapting the covariance.
This paper is organized as follows: in Section II and III directed online goal babbling and CMAES are reviewed. Section IV introduces the simple heuristics to define the goal space, implements directed goal babbling on the musculoskeletal robot arm, and evaluates the learning results. Section V proposes, implements, and evaluates local online motor babbling using CMAES to query motor abundance, while providing some insights in muscle stiffness and muscle synergy of the musculoskeletal robot system. Section VI concludes the paper and discusses possible future research directions.
Ii Directed Goal Balling
Given the specified convex goal space encapsulating goal points, and denoting all the reachable set of commands in the motor space as , the aim is to learn the inverse kinematics model , that generalizes all points in the goal space to a subset of solutions in the motor space. Starting from the known home position , and home posture , i.e., the inverse mapping , the goaldirected exploration is generated by
(1) 
where is the inverse mapping given learning parameter , and adds perturbation noise to discover new positions or more efficient motor commands in reaching goals. At every time step, the motor system forwards the perturbed inverse estimate, , and the actual
samples are used for regression, where prototype vectors and local linear mapping
[15] is used as the regression model, and to monitor the progress of exploration in the defined goal space.The major part of directed goal babbling is to direct the babbling of the end effector at specified goals and target positions. Each trial of goal babbling is directed at one goal randomly chosen from
, and continuous piecewise linear targets are interpolated along the path
(2) 
where are the target position and final goal of the trial, and being the step size. Target positions are generated until is closer than to , then a new goal is chosen. The purpose of directed goal babbling is to generate smooth movement around the end effector position, such that the locally learned prototype vectors can bootstrap and extend the exploration of the goal space, and allow the integration of the following weighting scheme
(3)  
(4)  
(5) 
and measure direction and kinematic efficiency of the movement, such that inconsistency of a folded manifold, and redundant joint positions can be optimized[11]. The multiplicative weighting factor is then integrated to the gradient descent that fits the currently generated samples by reducing the weighted square error (see Appendix).
To prevent drifting to irrelevant regions and facilitate bootstrapping on the local prototype centers, the system returns to
with probability
instead of following another goal directed movement. Returning to home posture stablizes the exploration in the known area of the sensorimotor space [12], [18], similar to infants returning their arms to a comfortable resting posture between practices:(6) 
the system moves from the last posture to the home posture in the same way as in (2) by linearly interpolating the viapoints along the path, until
The exploratory noise, or motor perturbation in 1
, is crucial for discovering new postures that would otherwise not be found by the inverse estimate [12], [29]. By exploring the local surrounding of the inverse estimate with i.i.d normal distribution in each motor dimension, and varying these distribution parameters with a normalized Gaussian random walk, the noise is modeled as:
(7) 
where all entries in the matrix is initialized and varied as follows:
After learning, the average reaching accuracy is evaluated by querying the inverse model for every goal within the defined goal space , and a simple feedback controller to adapt execution failures. Execution failure occurs when the inverse estimate is not possible to execute, i.e., , due to interference, nonstationary bionic robot design and constantly changing overtime. Given the queried goal and the predicted posture , where , the feedback controller would slightly shift the queried goal from to , then forwarding the inverse estimate . Target shifting follows the current observed error , and integrated over time:
(8) 
Iii CmaEs
CMAES is a method of black box optimization that minimizes the objective function , , where is assumed to be a high dimensional, nonconvex, nonseparable, and illconditioned mapping of the multivariate state space. The idea of CMAES is introducing a multivariate normal distribution to sample a population, evaluating the population to select the good candidates, and updating the search distribution parameters by adapting the covariance and shifting the mean of the distribution according to the candidates.
Given a start point
and initializing the covariance to identity matrix
, the search points in one population iteration is sampled as follows:(9) 
where , being the mean vector, being the stepsize, and is the population size. For notation simplicity, the iteration index is henceforth omitted.
The mean vector is updated by using the nonelitistic selection [16]. Let denote the th best solution in the population of , the best points from the sampled population are then selected, such that , and weighted intermediate recombination is applied:
(10) 
The step size is updated using cumulative stepsize adaptation (CSA). The intuition is when the evolution path, i.e., the sum of successive steps, is short, single steps tend to be uncorrelated and cancel each other out, thus the stepsize should be decreased. On the contrary, when evolution path is long, single steps points to similar directions and tend to be correlated, therefore increasing the step size. Initializing the evolution path vector , and setting the constants , the step size is updated as:
(11)  
(12) 
The essential part of the evolution strategy is the covariance matrix adaptation. It is suggested that the line distribution adapted using rankone update will increase the likelihood of generating successful steps , because the adaptation follows a natural gradient approximation of the expected fitness of the population .
(13)  
(14) 
Iv Inverse Kinematics Learning
We use a 10 DoF musculoskeletal robot arm from [6] for the experiments, where the robot is controlled via ROS messages. The arm is driven by 24 pneumatic muscles, each with pressure actuation range of MPa.
As shown in Fig. 1, the hand of the robot is replaced with a tennis ball as the color marker, and tracking of the end effector is performed in reference to the center of the red marker as the origin, using Intel RealSense ZR300. However the tracking introduces an error up to 1cm in depth, i.e., xaxis, and submillimeter error in y and z axis. The colored point cloud overlayed in ROS rviz is the specified convex goal space as in Fig. 1. The control accuracy of the robot is tested according to [3]. By repeating random postures for times each, the average Euclidean norm error is computed to be cm as follows:
Iva Define the Goal Space
The complete task space of the upper limb robot is unknown and nonconvex, however directed goal babbling would require the specified goal space to be convex to efficiently bootstrap and allow the integration of the weighting scheme in (5). Thus we first empirically estimate the goal space by randomly generating random postures for each muscle within MPa, and take the encapsulated convex hull as the empirical goal space . In order to approximate the uniform samples in for efficient online learning and evaluations, a cube grid with spacing encapsulating is defined, where . The sampled convex hull goal grid in Fig. 1 is then made from the intersection of all points in the empirical goal space and the cube grid, i.e., . However, as shown in Fig. 2, is a slanted nonconvex irregular ellipsoid, forcing a convex hull in the random posture samples would introduce nonreachable regions in the goal space. This is addressed later with the similar set operation to remove the outlier goals using the learned prototype vector space.
IvB Experiment and Results
The experiment is conducted with samples, with target step length , which corresponds to the target velocity of cm/s, allowing the robot to generate smooth local movements. The sampling rate is set to , generating 5 targets and directed micro movements for learning. After every 4000 samples, performance evaluation is carried out online. The learning experiment including online evaluations amount to less than 2 hours real time. As illustrated in Fig. 3, the learning bootstraps quite fast in the first 4000 samples, followed by a slow convergence until 16000 samples. At , the feedback controller is applied, the performance error drops to 3.4cm. However in there are still many outlier goals, which are the nonreachable regions introduced by forcing the convex hull. A similar set intersection operation is applied with the learned prototype spheres and the goal space , where is taken as the encapsulated space of the prototype spheres, and the final goal space is , as shown in Fig. 5, where the number of goals has been reduced from 179 in to 94 in . We then evaluate again these 94 goals with the feedback controller, the performance error reduces further to an average of 1.8 cm in 3. However due to the forced convex hull , local inverse models cannot efficiently regress at the edge of the task space, the error distribution still shows a few errors larger than 3 cm, which can be further reduced later by motor babbling using CMAES.
V Learning Motor Redundancy
CMAES explores by expanding the search distribution of the parameters, shifting the mean and expanding covariance, until optimum solution is found within that distribution, followed by shrinking the covariance and shifting the mean to the global optimum. By intentionally setting the initial mean vector slightly away from the opitimum, i.e., the posture that leads to closest end effector position to the goal, CMAES would naturally expand the covariance while keep the search within the vicinity of the queried goal, as the objective function is set to minimize the goal reaching error. Essentially CMAES is used to effectively generate motor babbling data, which can be achieved by initializing the mean vector with neighboring goal postures of the queried goal
, and the stepsize with the empirical variance estimate of the local samples around
gathered from the goal babbling process.Va Local Online Motor Babbling
When learning inverse kinematics using online goal babbling, since there are multiple postures reaching , it is assumed that we don’t need to know all redundancy, and only learn the ones with most direction and kinematic efficiency by integrating the weighting scheme (5) in the optimization. In fact, is not only unknown, and may never be exhaustively explored on a physical system, but also nonstationary due to the nature of musculoskeletal robot design with PAMs. This can be addressed by using the simple feedback controller in (8), where execution failures due to the changing of are adapted when the queried goal is slightly shifted based on the proportion of the euclidean error .
As illustrated in Algorithm 1, the queried goal , the learned inverse model , and the neighboring postures , which is collected from the goal babbling process, are the input to online motor babbling. The aim of the algorithm is to output a new posture configuration set , from which different muscle stiffness can be generated while keeping the end effector position fixed. The initialization sets the gain and number of iteration of the feedback controller to , , t number of trials for CMAES , and the prototype sphere radius is . We use pycma library[17] to implement CMAES, where we encode variables in the objective function implicitly [16]. The objective function is simply set as the euclidean norm to the goal scaled with a constant, i.e., , where , and the optimum objective function value is set to , meaning that an empirical optimum of to the goal, which is also the stopping criteria for each CMAES trial.
Each trial of CMAES starts by iterating the feedback controller and finding the posture that leads closest to the neighboring goal, and is subsequently used to initialize the mean vector . The covariance is initialized to be an identity matrix, which allows isotropic search and avoids bias. In order to initialize the stepsize, an empirical variance is estimated from , and the mean of the variance is taken as initialization. The union of the two sets is to ensure sufficient data for a feasible estimation. Near the home position, which is the centroid of the goal space, many data samples are available as online goal babbling often comes back to . However around the edges of the goal space, there are often very few local samples, sometimes less than the action space dimension, i.e., the 24 muscles. By taking in the samples generated by feedback controller, a better initialization of can be robustly estimated.
VB Visualizing Muscle Abundance
In order to visualize muscle abundance, namely in terms of reproducing muscle stiffness and muscle synergy encoded in the evolved covariance matrix, we assume the distribution of parameters to be multivariate Gaussian and multimodal, as the motor space is of high dimension, and there can be different muscle group posture configurations while keeping the end effector fixed. Therefore a multivariate Gaussian Mixture Model
[18] is fit to the collected data in . By assuming a distribution of Gaussian parameters over the data samples, a prior multivariate Gaussian distribution is introduced
are the weights for each Gaussian mixture component, and the posterior distribution is estimated by using Bayes rule [18], such that the posterior distribution would preserve the form Gaussian mixture model, i.e.,
where the parameters and weights
are updated using Expectation Maximization (EM) to maximize the likelihood
[18]. The number of mixture models is estimated using Bayesian Information Criterion (BIC) [18] for , where the lowest BIC of is taken. Finally, we sample from the mixture model with updated parameters and weights and forward on the robot.VC Experiment and Results
We evenly selected 10 goals in the final goal space to perform online motor babbling. The selected goals and their local samples within 2cm radius are shown in Figure 6. The goals are selected to show case the generality of querying any goal within the goal space for motor babbling. Around the edges, goal 26, 5, 89, 52 are chosen, and near the centroid home position, goal 44 and 39 are selected. The rest goals 17, 43, 55, and 60 are to populate the rest of the goal space. It can be expected and observed that more samples were generated near the home posture, since in online goal babbling the arm returns to with probability , whereas goals around the edges have only a few samples, such as goal 26 and 89.
For each selected goal, trials of CMAES is performed as in Algorithm 1, where each trial takes on average 5 minutes experiment time on the robot. Muscle stiffness is then reproduced by first fitting the collected neighboring samples to the Gaussian mixture model, which serves as a baseline learned during goal babbling, followed by another experiment fitting to the mixture model and the subsequent sampling. 200 samples from the mixture model is evaluated on the robot, the mean and standard deviation of the reaching error, and of pressure variance are plotted. As illustrated in Fig. 7, CMAES outperforms the baseline in terms of both larger muscle pressure variance and smaller goal reaching error, where the average lies close to the 2cm prototype sphere radius. Since online goal babbling favors kinematic and direction efficiency by reducing motor redundancy, the sampled muscle pressure generally varies trivially compared to the ones generated from the CMAES GMM model, which expands the variance in search of global optimum while keeping the goal reaching accuracy. Due to nonstationary changes of the possible posture configurations , the local neighboring samples no longer lead to a close position to the goal, however of the neighboring goals can be used for initializing the stepsize , and initializing the mean vector from , to adapt to nonstationary changes and maintain the goal reaching accuracy while performing motor babbling.
It can be observed that for goal 44, which is closest to the home position, the motor variance doesn’t increase much as other queried goals. This is because every time the interpolated directed goal path comes across the centroid home region, goal 44 has a higher chance of collecting more samples of varied motor configurations within the neighborhood. Nevertheless, CMAES still explores motor redundancy rather efficiently. As shown in Fig. 8, the evolution trial expands the maximum and minimum standard deviation of the search, i.e., such that the optimum is reached. After 5 such evolution trials, the sampled GMM data is used to estimate the covariance, compared with the covariance estimate from the baseline GMM data. As shown in 9, CMAES preserves the structure while enhancing the variance on the diagonal, while also discovers more correlation within different groups of muscles, which can be prominently observed on the robot in Fig. 10.
VD Interpreting Muscle Abundance
#  Muscle Name  Function 

3  serratus anterior  pulls scapula forward 
7  latissimus dorsi  rotates scapula downward 
8  rear deltoid  abducts, flexes, 
10  front deltoid  and extends 
11  medial deltoid  the shoulder 
15  biceps brachii  flexes and supinates the forearm 
18  brachialis  flexes the elbow 
19  pronator  pronates the hand 
20  supinator  supinates the hand 
The muscle pressure variability in the covariance encodes muscle abundance, which can be interpreted as muscle stiffness and static muscle synergies. Loosely speaking, muscle synergy is defined as a coactivation pattern of muscles in a certain movement from a single neural command signal[19]. It can be argued that muscle synergy is a way of kinetically constraining the redundant motor control of limited DoFs, or as neural strategies to handle the sensorimotor systems[20].
We claim no sides in the sensorimotor learning of humans, however, by constraining the end effector position of the musculoskeletal robot arm, the static muscle synergies and stiffness can be encoded in the covariance matrix and provide some useful insights. In Fig. 9, muscles of high variances, namely muscle 3, 7, 8, 10, 11, 15, 18, 19, 20 are of particular interest, where muscle 7 and 8, 20 and 8 are highly negatively correlated. Inspired by the human’s upper limb, the PAMs of the robot arm mimics the function of human arm muscles, as illustrated in Table I. By fitting the data in the mixture models and subsequently applying sampling, we can observe the coactivation patterns of the muscles. As shown in Fig.10 and 10, the upper limb first reaches goal 44 with a relaxed arm posture and a lowered adducted shoulder, whereas in Fig.10 and 10 the end effector position is maintained by stiffening the arm, lifting the extended shoulder, and pronating the hand. The negative correlation of muscle 7 and 8 can be interpreted as the coordination of extension and abduction, as well as the flexion and adduction of the shoulder. Muscle 8 and 20 coordinate shoulder abduction with a supinating hand, and by adducting shoulder while pronating the hand.
Vi Conclusions
We have implemented directed goal babbling[11] to learn the inverse kinematics of a 10 DoF musculoskeletal robot arm actuated by 24 PAMs. We defined the goal space by empirically sampling 2000 random postures and forcing a convex hull ready for learning, and postprocessed the goal space to removed outlier goals. The result shows an average reaching error of 1.8 cm, where the reaching accuracy achievable by the robot is 1.2 cm. The simple heuristics and approximation of the goal space allows us to use directed goal babbling to learn a larger sensory space compared to a welldefined yet small partial task space, and promote more efficient mapping to the motor space compared to active exploration[10]. Nevertheless, learning with a forced convex goal space where the intrinsic task space is nonconvex introduces outlier goals, which the corresponding directed babbling can be misleading. A future research direction of integrating directed goal babbling with active exploration could be of interest, where the goal space grid can be defined large enough to encapsulate the whole task space, and active exploration guided by the kd tree splitting and progress logging can indicate the learned task space while still keep the bootstrapping flavor of the inverse model.
We further extended directed goal babbling to local online motor babbling using CMAES in search of more motor abundance. By initializing the evolution strategy with local samples generated from goal babbling, any point within the goal space can be queried for motor abundance. The idea is to intentionally initializing the mean vector of CMAES slightly away from the queried goal. By expanding covariance and setting the stop condition to meeting the set optimum of the objective function value, efficient motor babbling data can be generated locally around the queried goal with a few CMAES trials of different initializations from the neighboring goals. We evenly selected 10 goals within the goal space to showcase the generality of local online motor babbling. The results show that our proposed method has significantly increased the average muscle pressure variances, while keeping the end effector more stable and closer to the queried goals, compared with the goal babbling baseline. Even in the home position where motor abundance has already been wellexplored, local motor babbling shows a maximum increased standard deviation of 0.1 MPa, constituting 25% of the muscle pressure actuation range. Our method also adapts to queried goals near the edges of the goal space where samples for initialization are sparse due to the uneasy posture of the robot arm around such goals.
By fitting Gaussian mixture models to the data collected using local motor babbling, the sampling of the GMMs can reproduce motor abundance in terms of muscle stiffness and muscle synergies encoded in the evolved singlemode covariance matrix. Muscle stiffness can be seen on the inflating and deflating muscles, and muscle synergies can be clearly observed in the covariance where variances and correlations are strong, as well as when GMM sampled postures are applied on the robot correspondingly. The bonus that comes with the encoded covariance and mixture models is that the queried motor abundance can be captured and reproduced by distributions, which enables the formulation of trials for reinforcement learning in future research, such as learning weight lifting with varied muscle stiffness, planning trajectories and learning dynamics using viapoints and the locally queried motor abundance library.
Acknowledgment
The author would like to thank Hiroaki Masuda for his mechanical maintenance of the upper limb robot, and Dr. Matthias Rolf for his suggestions in implementing directed online goal babbling.
References
 [1] R. P. Paul, Robot manipulators: mathematics, programming, and control: the computer control of robot manipulators. Richard Paul, 1981.
 [2] D. NguyenTuong and J. Peters, “Model learning for robot control: a survey,” Cognitive processing, vol. 12, no. 4, pp. 319–340, 2011.

[3]
M. Rolf and J. J. Steil, “Efficient exploratory learning of inverse kinematics
on a bionic elephant trunk,”
IEEE transactions on neural networks and learning systems
, vol. 25, no. 6, pp. 1147–1160, 2014.  [4] K. Hosoda, S. Sekimoto, Y. Nishigori, S. Takamuku, and S. Ikemoto, “Anthropomorphic muscular–skeletal robotic upper limb for understanding embodied intelligence,” Advanced Robotics, vol. 26, no. 7, pp. 729–744, 2012.
 [5] R. Niiyama, S. Nishikawa, and Y. Kuniyoshi, “Biomechanical approach to openloop bipedal running with a musculoskeletal athlete robot,” Advanced Robotics, vol. 26, no. 34, pp. 383–398, 2012.
 [6] A. Hitzmann, H. Masuda, S. Ikemoto, and K. Hosoda, “Anthropomorphic musculoskeletal 10 degreesoffreedom robot arm driven by pneumatic artificial muscles,” Advanced Robotics, vol. 32, no. 15, pp. 865–878, 2018.
 [7] P. Gaudiano and S. Grossberg, “Vector associative maps: Unsupervised realtime errorbased learning and control of movement trajectories,” Neural networks, vol. 4, no. 2, pp. 147–183, 1991.
 [8] Y. Demiris and A. Dearden, “From motor babbling to hierarchical learning by imitation: a robot developmental pathway,” 2005.
 [9] A. D’Souza, S. Vijayakumar, and S. Schaal, “Learning inverse kinematics,” in Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No. 01CH37180), vol. 1. IEEE, 2001, pp. 298–303.

[10]
A. Baranes and P.Y. Oudeyer, “Active learning of inverse models with intrinsically motivated goal exploration in robots,”
Robotics and Autonomous Systems, vol. 61, no. 1, pp. 49–73, 2013.  [11] M. Rolf, J. J. Steil, and M. Gienger, “Online goal babbling for rapid bootstrapping of inverse models in high dimensions,” in Development and Learning (ICDL), 2011 IEEE International Conference on, vol. 2. IEEE, 2011, pp. 1–8.
 [12] M. Latash, “There is no motor redundancy in human movements. there is motor abundance,” 2000.
 [13] M. L. Latash, “The bliss (not the problem) of motor abundance (not redundancy),” Experimental brain research, vol. 217, no. 1, pp. 1–5, 2012.
 [14] P. Singh, S. Jana, A. Ghosal, and A. Murthy, “Exploration of joint redundancy but not task space variability facilitates supervised motor learning,” Proceedings of the National Academy of Sciences, vol. 113, no. 50, pp. 14 414–14 419, 2016.

[15]
T. Kohonen, “The selforganizing map,”
Proceedings of the IEEE, vol. 78, no. 9, pp. 1464–1480, 1990.  [16] N. Hansen, “The cma evolution strategy: A tutorial,” arXiv preprint arXiv:1604.00772, 2016.
 [17] N. Hansen, Y. Akimoto, and P. Baudis, “CMAES/pycma on Github,” Zenodo, DOI:10.5281/zenodo.2559634, Feb. 2019. [Online]. Available: https://doi.org/10.5281/zenodo.2559634
 [18] C. M. Bishop, Pattern recognition and machine learning. springer, 2006.
 [19] G. TorresOviedo, J. M. Macpherson, and L. H. Ting, “Muscle synergy organization is robust across a variety of postural perturbations,” Journal of neurophysiology, 2006.
 [20] M. C. Tresch and A. Jarc, “The case for and against muscle synergies,” Current opinion in neurobiology, vol. 19, no. 6, pp. 601–607, 2009.
Comments
There are no comments yet.