Generating Shared Latent Variables for Robots to Imitate Human Movements and Understand their Physical Limitations

10/11/2018 ∙ by Maxime Devanne, et al. ∙ 0

Assistive robotics and particularly robot coaches may be very helpful for rehabilitation healthcare. In this context, we propose a method based on Gaussian Process Latent Variable Model (GP-LVM) to transfer knowledge between a physiotherapist, a robot coach and a patient. Our model is able to map visual human body features to robot data in order to facilitate the robot learning and imitation. In addition , we propose to extend the model to adapt robots' understanding to patient's physical limitations during the assessment of rehabilitation exercises. Experimental evaluation demonstrates promising results for both robot imitation and model adaptation according to the patients' limitations.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Low back pain is a leading cause disabling people particularly affecting the elderly, whose proportion in European societies keeps rising, incurring growing concern about healthcare. 50 to 80% of the world population suffers at a given moment from back pain which makes it in the lead in terms of health problems occurrence frequency 

[1]. To tackle this chronic low back pain, regular physical rehabilitation exercises is considered most effective [2].

With this perspective, solutions are being developed based on assistive technology and particularly robotics [3, 4, 5] where humanoid robots are used for demonstrating rehabilitation exercises to patients. These robots have previously learned these exercises from physiotherapist. However, due to different morphologies between humans and robots, and possible physical limitations of patients, human motion may be difficult to understand by a robot. In this work, we address these issues by training a common low dimensional latent space shared between the therapist, the robot coach and patients, as illustrated in Fig. 1 (left). This model allows us to learn an ideal rehabilitation exercise from physiotherapist demonstrations which can be difficult using human data. Moreover, this ideal motion representation is easily interpreted by the robot coach to make it reproduce the correct exercise to the patient. Finally, this model is also employed to adapt the robot’s understanding and analysis to the possible physical limitations of patients attending the rehabilitation session.

Figure 1: (Left) Overview of approach. (Right) Schema of different GP-LVM

2 Related Work

In the literature, the challenges of robot imitation and motion assessment by robot coaches are usually addressed separately.

In the context of robot imitation, several vision-based approaches have been proposed. Riley et al. [6]

proposed an approach for real-time control of a humanoid by imitation. The imitation is using a stereo vision system to record human trajectories by exploiting color markers on the demonstrators attached to the upper body by inverse kinematics. The authors apply IK to estimate the human’s joint angles and then map it to the robot. Dariush et al. 

[7] presented an online task space control theoretic retargeting formulation to generate robot joint motions that adhere to the robot’s joint limit constraints, joint velocity constraints and self-collision constraints. The inputs to the proposed method include low dimensional normalized human motion descriptors, detected and tracked using a vision based key-point detection and tracking algorithm. Koenemann et al. [8] presented a system that enables humanoid robots to imitate complex whole-body motions of humans in real time. The system uses a compact human model and considers the positions of the end effectors as well as the center of mass as the most important aspects to imitate. Stanton et al. [9]

used machine learning to train neural networks to map sensor data to joint space. However, these two last approaches employ human motion capture system instead of vision features to capture the human motion. this makes the system not suitable for real-word scenario like physical rehabilitation.

Only few approaches addressed the challenge of physical rehabilitation through coaching robot systems. While several studies showed the potential of virtual agents [10, 11] and physical robots [12] to enhance engagement and learning in health, physical activity or social contexts, Fasola et al [13] showed better assessment by the elderly subjects of the physical robot coach compared to virtual systems. Robots for coaching physical exercises have been recently presented [14, 15, 16]

. These approaches employed robots with few degrees of freedom that facilitates the imitation process. However, such robots do not allow realistic movements. Moreover, Takenori

et al [16] did not provide any feedback or active guidance to the patient.

In this paper, we employ a humanoid robot with many degrees of freedom called Poppy [17] and capture human motion using a kinect sensor with a skeleton tracking algorithm from depth images. We propose a method to simultaneously consider the challenge of robot imitation and human motion assessment in a physical rehabilitation context.

3 Proposed Approach

3.1 Shared Gaussian Process Latent Variable Model

Our goal is to learn a latent space where we can represent and compare both human and robot poses. Human upper body poses are characterized by skeletons captured with a kinect sensor providing the 3D position of a set of joints. A human pose is thus defined as , where denotes the human space. Robot poses are characterized as the motor angles of the Poppy robot including motors. Hence, a robot pose is defined as , where denotes the robot space. To learn such a shared space , we employ the shared Gaussian Process Latent Variable Model [18]

GP-LVM [19] (See Fig. 1 (right)) is a probabilistic model mapping high dimensional observed data from a low dimensional latent space using a Gaussian process, with zero mean and covariance function characterized by a kernel : . For the kernel

, we adopt the popular Radial Basis Function. The shared GP-LVM is an extension of GP-LVM for multiple data space that shares a common latent space. In our work, we have two observation spaces, the human space

and the robot space . Given a training set of human poses and corresponding robot poses , two mapping functions from the latent space X to observed spaces are defined:


where and

are RBF kernel matrices with hyperparameters

and . In shared GPLVM, optimal latent locations are unknown and need to be learned as well as hyperparameters of mappings and . This is done by optimizing the joint marginal likelihood . We are interesting in mapping data from the human space to robot space through the latent space. Hence, an inverse mapping from the human space to the latent space is required. For that purpose, back constraints are introduced [21]. This feature allows to define latent locations with respect to observed data, , where is an RBF function parameterized by weights . These weights are learned during optimization process instead of latent locations:


As body parts can move concurrently and independently, we consider different shared latent space for each body part separately. Therefore, our approach can also be extended to cases also using lower body parts, by just adding latent spaces for the left and right legs. We use three 2D latent space for the two arms and the spine.

Figure 2: (left) Three rehabilitation exercises represented in the 2D latent space of the left arm. (right) Corresponding human and robot poses of locations A, B, C and D.

3.2 Gaussian Mixture Model on the Latent Space

Once we trained a shared latent space, we can propose to learn a Gaussian Mixture Model on this low dimensional space. This allows to learn an ideal movement from therapist demonstrations projected on the shared space. It can then be employed for robot imitation by projecting back the ideal movement in the robot space. From

therapist demonstrations , the Gaussian Mixture Model on the latent space is defined as , where encodes the human pose projected on the shared latent space. is the number of Gaussians, is the weight of the -th Gaussian, and are the mean and covariance matrix of the -th Gaussian. The parameters , and

are learned using Expectation-Maximization. Once a model is learned for each exercise, we generate an optimal sequence using Gaussian Mixture Regression (GMR) which approximates the sequence using a single Gaussian:

. This optimal sequence is then projected to the robot space to make the robot imitates the expert and demonstrates the exercise to the patient.

3.3 Transferring Knowledge from Therapist to Patient

In our rehabilitation scenario, the robot coach needs to evaluate the patient’s movement captured using a kinect sensor similarly to therapist’s movement. However, patients needing rehabilitation are often constrained by physical limitations or pain while performing exercises. It may result an incorrect performance even if they did their best to perform the correct exercise. A robust and effective robot coach system must consider such features. We propose to extend the learn shared GP-LVM (see Fig. 1 (right)) by considering two distinct human pose spaces and for the therapist and the patient, respectively. is equivalent to described above. differs from in the inverse mapping function to the latent space. Specifically, a therapist pose and the corresponding patient pose with physical limitations must be represented by the same point in the latent space. For that, the weight matrix of the inverse mapping is updated according to the patient. Let be a patient’s performance of an exercise and the corresponding ideal demonstration of the same exercise projected on the latent space. The optimization becomes:


The patient specific weight matrix is optimized using gradient descent algorithm. Fig. 3 shows a patients’ sequence in the latent space before (red) and after the update (green) in comparison to the ideal therapists’ sequence (blue).

Figure 3: (left) A wrong exercise in the latent space before (red) and after (green) the model updating. (right) Corresponding human and robot poses of points A, B, and C.

4 Experimental Results

We evaluate our method on the three rehabilitation exercises selected in cooperation with physiotherapists and performed by two subjects three times 111Videos are available on playing the role of the physiotherapist and the patient, respectively. In addition, subjects performs incorrect exercises by simulating errors 222Videos are available on For the first exercise, the arms are not enough raised. For the second exercise, the subject does not tilt the arm and keep it straight. In the third exercise, the arms are not enough raised.

For robot movements, we build ideal robot movements with the cooperation of a physiotherapist manipulating the robot in order to perform the desired rehabilitation movement while we record angle positions along the motion. We record one ideal movement per exercise. In addition simulated movements with errors described above are also recorded. These robot movements are used during training of the shared GP-LVM as well as ground truth during evaluation.

4.1 Imitation Evaluation

We first evaluate the ability of the approach to perform robot imitation. As described in section 3.2

, an ideal motion is generated using GMR on the latent space and the GMM model learned from expert demonstrations. This ideal motion is then transferred back to the robot space and compare to the ground truth. We compute the average RMSE error of motor angles between sampled sequence and ground truth. Moreover, we also normalized the RMSE by the standard deviation of motor angles for each exercise to compare the RMSE with the robot’s motion. Results are reported for each exercise in Table 


Exercise 1   Exercise 2   Exercise 3   Mean
RMSE 7.1 6.9 6.1   6.7
Normalized RMSE 0.31 0.18 0.34   0.28
Table 1: Robot imitation results.

We can see that we obtain a mean RMSE of 6.7 degrees corresponding to of the total range of Poppy motor angles. In addition, we obtain a normalized RMSE of 0.28 showing that the RMSE error is much lower than the standard deviation of rehabilitation movements, which represents the noise and the variations in the exercise. This validates the proposed model to imitate therapist demonstration with a high similarity accuracy so as to be clearly understood by the patient.

4.2 Therapist-Patient Transfer Evaluation

We then evaluate the ability of our model to transfer knowledge between a therapist and a patient with physical limitations. We first project the error sequence in the shared latent space. Then we project back the sequence to the robot space before and after applying weight updating as described in section 3.3. To show the robustness of the approach, we sample ten random sequences from the latent-robot Gaussian process mapping and compute RMSE error in comparison with ground truth. Average RMSE and standard deviation among the ten sampled sequences are computed. For comparison we also compute such RMSE values for correct sequences of the patient. Results are reported in table 2.

Exercise type   Exercise 1   Exercise 2   Exercise 3
Incorrect before update
Incorrect after update
Table 2: Therapist-Patient transfer results.

We can first observe that, as expected, RMSE errors are much higher for incorrect exercises than for correct exercises. However, if we consider that these errors are due to physical limitations of the patient and apply our updating method, we can see that the RMSE errors becomes close to correct exercises. This means that the robot understands the incorrect exercises similarly to correct exercises. In addition, we propose to deepen the analysis of the third exercise by similarly evaluating a different kind of error (arms are not enough outstretched) with the previously trained model. We obtain RMSE values of and before and after the update, respectively. The similar RMSE values show that by updating the model for one kind of error, it does not affect other type of errors as required in our rehabilitation scenario.

5 Conclusions

We have proposed a method based on Gaussian Process Latent variable Model for a robot coach system in physical rehabilitation. The method allows to learn a shared space between the therapist and the robot to facilitate robot learning and imitation. The model is then extended to consider variations of patients physical limitations. This allows the robot to understand and assess the patient independently of his physical limitation. Experimental evaluation demonstrates the efficiency of our approach for both robot imitation and model adaptation.

In the future, we plan to extend our experimental evaluation with more data acquired in real-world environment. Moreover, we would like to investigate the use of key poses instead of full motion sequences during the model training. It would be suitable for a real-world rehabilitation scenario.

6 Acknowledgement

The research work presented in this paper is partially supported by the EU FP7 grant ECHORD++ KERAAL, by the the European Regional Fund (FEDER) via the VITAAL Contrat Plan Etat Region and by project AMUSAAL funded by Region Brittany, France.


  • [1] on the Burden of Musculoskeletal Conditions at the Start of the New Millennium, W.S.G., et al.: The burden of musculoskeletal conditions at the start of the new millennium. World Health Organization technical report series 919 (2003)  i
  • [2] Kent, P., Kjaer, P.: The efficacy of targeted interventions for modifiable psychosocial risk factors of persistent nonspecific low back pain–a systematic review. Manual therapy 17(5) (2012) 385–401
  • [3] Devanne, M., Mai, N.S.: Multi-level motion analysis for physical exercises assessment in kinaesthetic rehabilitation. In: IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids). (November 2017)
  • [4] Gorer, B., Salah, A.A., Akın, H.L.: An autonomous robotic exercise tutor for elderly people. Autonomous Robots 41(3) (7 2017) 657–678
  • [5] Devanne, M., N.S.M.R.N.O.L.G.G.B.K.G..T.A.: A co-design approach for a rehabilitation robot coach for physical rehabilitation based on the error classification of motion errors. In: Second IEEE International Conference on Robotic Computing (IRC). (January 2018)
  • [6] Atkeson, M.R..A.U..K.W..C.: Enabling real-time full-body imitation: a natural way of transferring human movement to humanoids. In: IEEE International Conference on Robotics and Automation (ICRA). (September 2003)
  • [7] B. Dariush, M. Gienger, A.A.Y.Z.B.J.K.F., Goerick, C.: Online transfer of human motion to humanoids. Int. Journal of Humanoid Robotics (IJHR) 6(2) (2009)
  • [8] Bennewitz, J.K..F.B..M.: Real-time imitation of human whole-body motions by humanoids. In: IEEE International Conference on Robotics and Automation (ICRA). (June 2014)
  • [9] Stanton, C., B.A..R.E.: Teleoperation of a humanoid robot using full-body motion capture, example movements, and machine learning. In: Australasian Conference on Robotics and Automation (ACRA). (December 2012)
  • [10] Waltemate, T., Hülsmann, F., Pfeiffer, T., Kopp, S., Botsch, M.: Realizing a low-latency virtual reality environment for motor learning. In: Proceedings of ACM Symposium on Virtual Reality Software and Technology (VRST). (2015)
  • [11] Anderson, K., André, E., Baur, T., Bernardini, S., Chollet, M., Chryssafidou, E., Damian, I., Ennis, C., Egges, A., Gebhard, P., et al.: The tardis framework: intelligent virtual agents for social coaching in job interviews. In: Advances in computer entertainment. Springer (2013) 476–491
  • [12] Belpaeme, T., Baxter, P.E., Read, R., Wood, R., Cuayáhuitl, H., Kiefer, B., Racioppa, S., Kruijff-Korbayová, I., Athanasopoulos, G., Enescu, V., et al.: Multimodal child-robot interaction: Building social bonds. Journal of Human-Robot Interaction 1(2) (2012) 33–53
  • [13] Fasola, J., Mataric, M.: A socially assistive robot exercise coach for the elderly. Journal of Human-Robot Interaction 2(2) (2013) 3–32
  • [14] Görer, B., Ali Salah, A., Akm, H.L.: A robotic fitness coach for the elderly. In: 4th International Joint Conference, AmI 2013. (December 2013)
  • [15] Schneider, S., Kümmert, F.: Exercising with a humanoid companion is more effective than exercising alone. In: Humanoid Robots (Humanoids), 2016 IEEE-RAS 16th International Conference on, IEEE (2016) 495–501
  • [16] Obo, T., Loo, C.K., Kubota, N.: Imitation learning for daily exercise support with robot partner. In: Robot and Human Interactive Communication (RO-MAN), 2015 24th IEEE International Symposium on, IEEE (2015) 752–757
  • [17] Lapeyre, M.: Poppy: open-source, 3D printed and fully-modular robotic platform for science, art and education. PhD thesis, Université de Bordeaux (2014)
  • [18] Shon, A., G.K.H.A..R.R.P.: Learning shared latent structure for image synthesis and robotic imitation. In: 18th International Conference on Neural Information Processing Systems. (December 2006)
  • [19] Lawrence, N.D.:

    Gaussian process latent variable models for visualisation of high dimensional data.

    In: Advances in neural information processing systems. (December 2006)
  • [20] Møller, M.F.:

    A scaled conjugate gradient algorithm for fast supervised learning.

    Neural Networks 6(4) (2009) 525–533
  • [21] Lawrence, N.D., Candela, J.Q.: Local distance preservation in the gp-lvm through back constraints. In: International Conference on Machine Leraning (ICML). (December 2006)