Log In Sign Up

Manipulating Soft Tissues by Deep Reinforcement Learning for Autonomous Robotic Surgery

In robotic surgery, pattern cutting through a deformable material is a challenging research field. The cutting procedure requires a robot to concurrently manipulate a scissor and a gripper to cut through a predefined contour trajectory on the deformable sheet. The gripper ensures the cutting accuracy by nailing a point on the sheet and continuously tensioning the pinch point to different directions while the scissor is in action. The goal is to find a pinch point and a corresponding tensioning policy to minimize damage to the material and increase cutting accuracy measured by the symmetric difference between the predefined contour and the cut contour. Previous study considers finding one fixed pinch point during the course of cutting, which is inaccurate and unsafe when the contour trajectory is complex. In this paper, we examine the soft tissue cutting task by using multiple pinch points, which imitates human operations while cutting. This approach, however, does not require the use of a multi-gripper robot. We use a deep reinforcement learning algorithm to find an optimal tensioning policy of a pinch point. Simulation results show that the multi-point approach outperforms the state-of-the-art method in soft pattern cutting task with respect to both accuracy and reliability.


page 1

page 2

page 3

page 4


A New Tensioning Method using Deep Reinforcement Learning for Surgical Pattern Cutting

Surgeons normally need surgical scissors and tissue grippers to cut thro...

A Robotic System for Implant Modification in Single-stage Cranioplasty

Craniomaxillofacial reconstruction with patient-specific customized cran...

Surfing on an uncertain edge: Precision cutting of soft tissue using torque-based medium classification

Precision cutting of soft-tissue remains a challenging problem in roboti...

Problem of robotic precision cutting of the geometrically complex shape from an irregular honeycomb grid

The article considers solving the problem of precision cutting of honeyc...

Learning the Latent Space of Robot Dynamics for Cutting Interaction Inference

Utilization of latent space to capture a lower-dimensional representatio...

DiSECt: A Differentiable Simulator for Parameter Inference and Control in Robotic Cutting

Robotic cutting of soft materials is critical for applications such as f...

Coarse-to-fine Seam Estimation for Image Stitching

Seam-cutting and seam-driven techniques have been proven effective for h...

I Introduction

In robotic surgery, manipulation of a deformable sheet, especially cutting through a predefined contour trajectory, is a critical task that has attracted a significant number of research interests [1, 2, 3, 4, 5]. The pattern cutting task is one of the Fundamental Skills of Robotic Surgery (FSRS) because it minimizes surgeon errors, operation time, trauma, and expenses [6, 7, 8]. Furthermore, the deformable material is usually soft and elastic, which is intractable to perform a cutting procedure accurately [9]. Therefore, it is necessary to use a gripper [10, 11], which holds a point (pinch point) on the sheet and tensions it along an allowable set of directions with a reasonable force while a surgical scissor is used to cut through a contour trajectory, as shown in Fig. 1.

In other words, the pattern cutting task involves two essential steps: 1) selecting a pinch point and 2) finding a tensioning policy from that pinch point. Previous study considers a single pinch point over the course of cutting, which is only efficient when the contour shape is simple. Conversely, it is more appropriate to divide a complicated contour into different segments. In this case, the use of one pinch point is unsafe and significantly reduces cutting accuracy [12].

In this paper, we examine a multi-point approach and compare the accuracy with its counterpart. Because the robot has a single gripper, only one pinch point is used for tensioning and the others are pinned permanently in the setup phase. Finally, we use a deep reinforcement learning algorithm, namely Trust Region Policy Optimization (TRPO) [13], as in [12], to seek the optimal tensioning policy from a pinch point. The tensioning policy determines the tensioning direction based on the current state of the sheet, contour information, and cutting position.

Fig. 1: A surgical pattern cutting task.

In this study, we use the simulator described in [12] to evaluate the accuracy and reliability of the proposed approach and compare its performance with the state-of-the-art method [12]. For the sake of conciseness, we call the method described in [12] the Single-point Deep Reinforcement Learning Tensioning (SDRLT) method. Furthermore, the simulator has a practical perspective because the learned tensioning policy is reevaluated by the well-known physical surgical system, da Vinci Research Kit (dVRK) [14]. Simulation results show that the multi-point approach outperforms the SDRLT method and achieves an average of 55% better accuracy than the non-tensioned baseline over a set of 14 multi-segment contours.

Finally, the paper has the following contributions:

  1. This work provides the first study of using multiple pinch points in pattern cutting task. The study shows the benefits of using multiple pinch points, particularly in complicated contours where a scissor is instructed to cut multiple segments to complete the whole contour trajectory.

  2. The proposed scheme outperforms the state-of-the-art method in pattern cutting task and becomes a premise to develop a significant number of research extensions such as the use of multiple grippers, multiple scissors, and multi-layer pattern cutting in a 3D environment.

  3. The multi-point approach imitates human demonstrations while cutting. Therefore, the proposed scheme is useful in both practical implication and theoretical analysis. Finally, the proposed scheme achieves high accuracy and reliability in pattern cutting task.

The rest of the paper is organized as follows. The next section reviews recent advances in surgical automation and the benefits of using reinforcement learning in surgical tasks. Section III presents a preliminary background of pattern cutting task and introduces our proposed scheme. Section IV discusses the experimental results of the proposed scheme and Section V concludes the paper.

Ii Related Work

The use of robot-assisted surgery allows doctors to automate a complicated task with minimal errors and effort. A number of automation levels have been discussed extensively in the literature [15, 16, 17, 18, 19, 20, 21]. Specifically, early approaches required the existence of experts to create a model trajectory that is used to teach a robot to automatically complete a designated task. For example, Schulman et al. [22] use a trajectory transfer algorithm to transform human demonstrations into model trajectories. These trajectories are updated to adapt to new environment geometry and hence assist a robot in learning the task of suturing. Recently, Osa et al. [23] propose a framework for online trajectory planning. The framework uses a statistical method to model the distribution of demonstrated trajectories. As a result, it is possible to infer the trajectories into a dynamic environment based on the conditional distribution.

Reinforcement learning (RL) has become a promising approach to modeling an autonomous agent [24, 25, 26, 27, 28]. RL has the abilities to mimic human learning behaviors to maximize the long-term reward. As a result, RL enables a robot to learn on its own and partially eliminates the existence of experts. Examples of these agents are the box-pushing robot [29], pole-balancing humanoid [30], helicopter control [31], soccer-playing agent [32], and table tennis playing agent [33]. Furthermore, RL has been utilized in a significant number of research interests in surgical tasks [34, 35]. For example, Chen et al. [36] propose a data-driven workflow that combines Programming by Demonstration [37] with RL. The workflow encodes the inverse kinematics using trajectories from human demonstrations. Finally, RL is used to minimize the noise and adapt the learned inverse kinematics to the online environment.

Recent breakthrough [38]

combines neural networks with RL (deep RL), which enables traditional RL methods to work properly in high-dimensional environments. Typically, Thananjeyan

et al. [12] use a deep RL algorithm (TRPO) to learn the tensioning policy from a pinch point in pattern cutting task. The tensioning problem is described as a Markov Decision Process [39] where each action moves the gripper 1mm along one of four directions in the 2D space. A state of the environment is a state of the deformable sheet, which is represented by a rectangular mesh of point masses. The goal is to minimize the symmetric difference between the ideal contour and the achieved contour cut. This approach, however, uses a single pinch point to complete the pattern cutting task. In this paper, we examine the use of multiple pinch points over the course of cutting. Finally, we compare the cutting accuracy of our proposed scheme with two baseline methods described in [12]: 1) the non-tensioned scheme and 2) SDRLT. Section III and Section IV present the proposed scheme in more details.

Iii Proposed Scheme

Iii-a Preliminary

Fig. 2: A deformable sheet is represented by a mesh of point masses. The left figure illustrates a pinch point without tensioning. The right figure illustrates a pinch point that is tensioned along vertical direction.

As mentioned earlier, the tensioning problem is described as a Markov Decision Process where a state of the environment is represented by a mesh of point masses. Initially, these points are aligned evenly in the horizontal direction by a distance and in the vertical direction by a distance (in the 2D space), as illustrated in Fig. 2. A set of points is indexed by . We define , where denotes a position of point in 3D space. Initially, we assume that . Therefore, a state of the environment at time is represented by .

To simulate the deformable material properly, each point is connected to its neighbors by a spring force. This force maintains the distance between neighboring points. A point is moved to a cut set if it does not have any constraints with its neighbors, i.e., the point is cut by a scissor. When we apply an external force to tension a pinch point, the positions of its neighbors can be calculated by using the Hooke’s law [12]:

where and represent time-constant parameters, represents a spring constant, and

denotes gravity. The vector of gravity

belongs to the z-axis. For simplicity, we assume . Let be a set of actions, we can define a tensioning policy as a mapping function from

to probability distribution of

, i.e., .

Iii-B Multi-Point Deep RL Tensioning Method

Fig. 3: Limitations of the use of one pinch point.

A robot is normally limited by spatial and mechanical constraints. Therefore, it is intractable to use a scissor to cut a complicated contour without interruptions. One solution is to divide the contour into multiple segments and find the cutting order among these segments to minimize the damage to the material [12]. However, the use of one pinch point makes it impossible to avoid any damage to the material, especially near the joint areas between segments. In Fig. 3, for example, the contour is divided into two segments: segment 1 is illustrated by the orange dots and segment 2 is illustrated by the dark blue dots. The pinch point is represented by a solid green dot. The gray areas denote the joint areas between the segments. After segment 1 is completely cut, a tensioning force applied to the pinch point inadvertently causes the joint areas to distort, i.e., we start cutting segment 2 from an improper position. This drastically reduces the cutting accuracy. To overcome these obstacles, we add a fixed pinch point in each joint area to avoid distortion. This approach is feasible as it is done in the setup phase, which can be arranged before the cutting process.

To further increase the efficiency by using pinch points, we proceed a divide and conquer approach. In other words, if the contour is divided into segments, we find a set of different pinch points. Because we have one gripper, only one pinch point is used for tensioning. We also assume that the gripper while moving to a different pinch point does not affect the deformable sheet. This assumption is reasonable in surgical tasks where the deformable material is not too soft. In Fig. 4, for example, we are cutting segment by using a pinch point . After the segment is cut, the gripper selects a pinch point to start cutting segment . This approach indicates that we need to find the best pinch point in each segment, which we call this process the local search. Previous work [12] finds the best pinch point for all the segments, which is an intractable task.

Fig. 4: The use of multiple pinch points for tensioning.

A local search process for segment involves two steps: 1) finding a set of candidate pinch points for segment and 2) selecting the best pinch point among the candidates. Fig. 5 describes the process of finding a set of candidate pinch points for a specific segment . Initially, we define a distance threshold , and then we find a set of candidate points around the segment based on . Specifically, we select a point as a candidate point if it satisfies the following equation:

where denotes the distance between two points and , and is a point in the segment . After this step, we have a set of candidates . We take candidates randomly from and put them in an empty set . After that, we remove all candidates that are direct neighbors in to form a set , as shown in Fig. 5. The next step is to use the TRPO algorithm to create a tensioning policy for each candidate in .

Fig. 5: Finding candidate pinch points of a segment.

Fig. 6 summarizes the workflow of our Multi-point Deep RL Tensioning method (MDRLT) as follows:

Fig. 6: A workflow of the MDRLT method.
  • Problem definition: A complex contour is defined in the deformable material.

  • Setup phase: This phase involves dividing the contour into multiple segments, finding cutting order between these segments, and finally adding fixed pinch points in joint areas.

  • Local search: As described earlier, the goal of this phase is to find a set of candidate tensioning pinch points in each segment.

  • Evaluation phase: This phase combines each candidate pinch point in each segment to evaluate the accuracy while cutting the whole contour.

  • Final phase: The best combination of candidate pinch points is selected together with fixed pinch points in joint areas. This phase terminates our algorithm.

Iv Performance Evaluation

Iv-a Simulation settings

In this section, we use the simulator described in [12] with the following parameter settings: , , , and . The threshold equals to 100. The maximum number of candidate pinch points in each segment is if the number of segments equals to 2 and if the number of segments is greater than 2. The algorithms to divide the contour into different segments are based on the mechanical constraints of the dVRK and developed in the simulator. To find the best cutting order among different segments, we use the exhaustive search to find the order that provides the highest accuracy. Each tensioning policy is trained with TRPO in 20 iterations, a batch size of 500, a step size of 0.01, and a discount factor of 1. We use the implementation of the TRPO algorithm as in [40]

. The cutting accuracy is also defined in the simulator, which is the symmetric difference between the ideal contour with the actual contour cut. The cutting reliability is measured by calculating the standard deviation while evaluating cutting accuracy. Finally, the simulator is significantly modified to support local search.

Iv-B Accuracy performance

Fig. 7: A testbed of 14 different open and closed contours.
Fig. 8: The mean of relative percentage improvement over NTB of five algorithms.
Fig. 9: The reliability comparison of five algorithms.
Contour Algorithm Eval 1 Eval 2 Eval 3 Eval 4 Eval 5 Eval 6 Eval 7 Eval 8 Eval 9 Eval 10 Mean
Figure A MDRLT-1 29 31 46 34 50 31 40 38 40 39 37.8
MDRLT-2 24 37 35 32 33 45 37 30 37 27 33.7
Figure B MDRLT-1 33 67 55 53 50 60 42 36 33 54 48.3
MDRLT-2 27 39 42 79 27 26 23 25 38 28 35.4
Figure C MDRLT-1 36 44 44 42 47 34 50 43 45 53 43.8
MDRLT-2 39 30 26 34 26 35 33 32 39 33 32.7
Figure D MDRLT-1 144 137 130 139 136 136 137 129 129 133 135
MDRLT-2 40 34 42 36 45 32 36 44 39 25 37.3
Figure E MDRLT-1 65 65 59 55 64 52 51 74 62 56 60.3
MDRLT-2 38 44 34 51 34 39 35 36 36 36 38.3
Figure F MDRLT-1 58 70 49 44 42 53 53 33 37 35 47.4
MDRLT-2 36 38 34 28 35 28 25 28 40 27 31.9
Figure G MDRLT-1 38 41 36 44 45 33 48 48 38 42 41.3
MDRLT-2 19 25 22 23 22 22 26 25 29 24 23.7
Figure H MDRLT-1 43 18 16 46 39 28 25 23 28 18 28.4
MDRLT-2 17 17 20 20 13 24 17 16 17 21 18.2
Figure I MDRLT-1 78 62 73 62 51 55 51 74 76 77 65.9
MDRLT-2 37 40 42 44 38 43 39 52 39 46 42
Figure J MDRLT-1 19 18 13 12 20 19 21 18 19 19 17.8
MDRLT-2 25 26 21 22 21 21 21 21 21 21 22
Figure K MDRLT-1 82 104 79 87 90 89 76 78 74 59 81.8
MDRLT-2 48 47 39 45 56 57 49 81 42 43 50.7
Figure L MDRLT-1 35 31 37 32 41 32 35 32 33 33 34.1
MDRLT-2 14 15 18 12 12 15 15 14 15 20 15
Figure M MDRLT-1 21 24 20 21 21 25 23 18 17 23 21.3
MDRLT-2 14 15 14 13 19 15 14 17 17 14 15.2
Figure N MDRLT-1 38 34 102 25 34 26 28 32 18 56 39.3
MDRLT-2 23 19 20 27 18 26 26 30 23 23 23.5
TABLE I: Raw values of the accuracy test using MDRLT-1 and MDRLT-2. Each figure is cut in 10 times to measure the symmetric difference between the ideal contour and the actual contour cut.

To compare the cutting accuracy between different algorithms, we select 14 complicated multi-segment contours described in [12], as shown in Fig. 7. The black dots represent the fixed pinch points that are used in the setup phase. The red dots represent the best tensioning pinch points found by the local search. We compare the cutting accuracy and reliability between six algorithms (the first four algorithms are based in [12]):

  • Non-Tensioned Baseline (NTB): We only use the scissor to cut the contour without using the gripper.

  • Single Fixed Pinch point without tensioning (SFP): A single fixed pinch point is used without tensioning.

  • Single Tensioning Pinch point (STP): A single tensioning pinch point is used.

  • SDRLT: A single tensioning pinch point is used. We use the TRPO to find the tensioning policy for the pinch point.

  • MDRLT-1: The proposed algorithm without using the fixed pinch points in joint areas.

  • MDRLT-2: The proposed algorithm using both fixed pinch points and tensioning pinch points.

We evaluate the proposed algorithms (MDRLT-1 and MDRLT-2) in 10 simulated trials for each figure in the testbed. Table I presents the raw values of the symmetric difference between the ideal contour and the actual contour cut in this evaluation. Fig. 8 shows the mean of relative percentage improvement in symmetric difference over the NTB method of five different algorithms. We see that the use of fixed pinch points during the setup phase determines the cutting accuracy. Therefore, MDRLT-1 is not better than SDRLT but MDRLT-2 significantly outperforms SDRLT, which is the state-of-the-art method in surgical pattern cutting.

Finally, Fig. 9 shows the absolute error while evaluating the cutting accuracy in 10 trials. This metrics represents the reliability of the proposed methods. Among three algorithms using deep RL, MDRLT-2 provides the highest reliability as it has the lowest value of absolute error.

V Conclusion

This paper introduces an interesting multi-point approach based on deep reinforcement learning for the surgical soft tissue cutting task that is meaningful in both practical perspective and theoretical analysis. In the theoretical analysis, the paper benchmarks the accuracy of the use of multiple pinch points during the course of cutting, which is the first study according to our best knowledge. The study also concludes that the use of fixed pinch points in joint areas is the key to significantly outperform the state-of-the-art cutting method with respect to accuracy and reliability.

The proposed approach becomes a normative workflow to ensure the safety in the surgical pattern cutting task. Moreover, it can be applied to a diversity of future research such as the use of multiple grippers or multiple scissors in surgical tasks, multi-layer pattern cutting in 3D space, or 3D multi-segment contours.


  • [1] A. Murali, S. Sen, B. Kehoe, A. Garg, S. McFarland, S. Patil, W. D. Boyd, S. Lim, P. Abbeel, and K. Goldberg, “Learning by observation for surgical subtasks: multilateral cutting of 3d viscoelastic and 2d orthotropic tissue phantoms,” in International Conference on Robotics and Automation (ICRA), pp. 1202–1209, 2015.
  • [2] H.-W. Nienhuys and A. F. Van der Stappen, “A surgery simulation supporting cuts and finite element deformation,” in International Conference on Medical Image Computing & Computer Assisted Intervention, 2001.
  • [3] K. A. Nichols and A. M. Okamura, “Methods to segment hard inclusions in soft tissue during autonomous robotic palpation,” IEEE Transactions on Robotics, vol. 31, no. 2, pp. 344–354, 2015.
  • [4] N. Haouchine, S. Cotin, I. Peterlik, J. Dequidt, M. S. Lopez, E. Kerrien, and M. O. Berger, “Impact of soft tissue heterogeneity on augmented reality for liver surgery,” IEEE Transactions on Visualization & Computer Graphics, vol. 1, 2015.
  • [5] T. T. Nguyen, N. D. Nguyen, F. Bello, and S. Nahavandi, “A New Tensioning Method using Deep Reinforcement Learning for Surgical Pattern Cutting,” arXiv preprint arXiv:1901.03327, 2019.
  • [6] R. A. Fisher, P. Dasgupta, A. Mottrie, A. Volpe, M. S. Khan, B. Challacombe, and K. Ahmed, “An overview of robot assisted surgery curricula and the status of their validation,” International Journal of Surgery, vol. 13, pp. 115–123, 2015.
  • [7] G. Dulan, R. V. Rege, D. C. Hogg, K. M. Gilberg-Fisher, N. A. Arain, S. T. Tesfay, and D. J. Scott, “Developing a comprehensive, proficiency-based training program for robotic surgery,” Surgery, vol. 152, no. 3, pp. 477–488, 2012.
  • [8] A. P. Stegemann, K. Ahmed, J. R. Syed, S. Rehman, K. Ghani, R. Autorino, et al., “Fundamental skills of robotic surgery: a multi-institutional randomized controlled trial for validation of a simulation-based curriculum,” Urology, vol. 81, no. 4, pp. 767–774, 2013.
  • [9] A. Shademan, R. S. Decker, J. D. Opfermann, S. Leonard, A. Krieger, and P. C. Kim, “Supervised autonomous robotic soft tissue surgery,” Science Translational Medicine, vol. 8, no. 337, 2016.
  • [10] K. Tai, A. R. El-Sayed, M. Shahriari, M. Biglarbegian, and S. Mahmud, “State of the art robotic grippers and applications,” Robotics, vol. 5, no. 2, 2016.
  • [11] G. Rateni, M. Cianchetti, G. Ciuti, A. Menciassi, and C. Laschi, “Design and development of a soft robotic gripper for manipulation in minimally invasive surgery: a proof of concept,” Meccanica, vol. 50, no. 11, pp. 2855–2863, 2015.
  • [12] B. Thananjeyan, A. Garg, S. Krishnan, C. Chen, L. Miller, and K. Goldberg, “Multilateral surgical pattern cutting in 2D orthotropic gauze with deep reinforcement learning policies for tensioning,” in International Conference on Robotics and Automation (ICRA), pp. 2371–2378, 2017.
  • [13] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in

    International Conference on Machine Learning

    , pp. 1889–1897, 2015.
  • [14] P. Kazanzides, Z. Chen, A. Deguet, G. S. Fischer, R. H. Taylor, and S. P. DiMaio, “An open-source research kit for the da Vinci Surgical System,” in International Conference on Robotics and Automation (ICRA), pp. 6434–6439, 2014.
  • [15] N. R. Crawford, S. Cicchini, and N. Johnson, “Surgical robotic automation with tracking markers, U.S. Patent Application No. 15/609,334, 2017.
  • [16] G. P. Moustris, S. C. Hiridis, K. M. Deliparaschos, and K. M. Konstantinidis, “Evolution of autonomous and semi‐autonomous robotic surgical systems: a review of the literature,” The International Journal of Medical Robotics and Computer Assisted Surgery, vo. 7, no. 4, pp. 375–392.
  • [17] J. Van Den Berg, S. Miller, D. Duckworth, H. Hu, A. Wan, X. Y. Fu, et al., “Superhuman performance of surgical tasks by robots using iterative learning from human-guided demonstrations,” in International Conference on Robotics and Automation (ICRA), pp. 2074–2081, 2010.
  • [18] J. M. Prendergast and M. E. Rentschler, “Towards autonomous motion control in minimally invasive robotic surgery,” Expert Review of Medical Devices, vol. 13, no. 8, pp. 741–748, 2016.
  • [19] D. T. Nguyen, C. Song, Z. Qian, S. V. Krishnamurthy, E. J. Colbert, and P. McDaniel, “IoTSan: fortifying the safety of IoT Systems,” arXiv preprint arXiv:1810.09551, 2018.
  • [20] C. Staub, T. Osa, A. Knoll, and R. Bauernschmitt, “Automation of tissue piercing using circular needles and vision guidance for computer aided laparoscopic surgery,” in International Conference on Robotics and Automation (ICRA), pp.  4585–4590, 2010.
  • [21] S. Sen, A. Garg, D. V. Gealy, S. McKinley, Y. Jen, and K. Goldberg, “Automating multi-throw multilateral surgical suturing with a mechanical needle guide and sequential convex optimization,” in International Conference on Robotics and Automation (ICRA), pp. 4178–4185, 2016.
  • [22] J. Schulman, A. Gupta, S. Venkatesan, M. Tayson-Frederick, and P. Abbeel, “A case study of trajectory transfer through non-rigid registration for a simplified suturing scenario,” in International Conference on Intelligent Robots and Systems (IROS), pp. 4111–4117, 2013.
  • [23] T. Osa, N. Sugita, and M. Mitsuishi, “Online trajectory planning and force control for automation of surgical tasks,” IEEE Transactions on Automation Science and Engineering, vol. 15, no. 2, pp. 675–691, 2018.
  • [24] T. T. Nguyen, N. D. Nguyen, and S. Nahavandi, “Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges, Solutions and Applications,” arXiv preprint arXiv:1812.11794, 2019.
  • [25] T. T. Nguyen, “A Multi-Objective Deep Reinforcement Learning Framework,” arXiv preprint arXiv:1803.02965, 2018.
  • [26] N. D. Nguyen, T. Nguyen, and S. Nahavandi, “System design perspective for human-level agents using deep reinforcement learning: A survey. IEEE Access, vol. 5, pp. 27091-27102, 2017.
  • [27] N. D. Nguyen, S. Nahavandi, and T. Nguyen, “A human mixed strategy approach to deep reinforcement learning,” arXiv preprint arXiv:1804.01874, 2017.
  • [28] T. Nguyen, N. D. Nguyen, and S. Nahavandi, “Multi-agent deep reinforcement learning with human strategies,” arXiv preprint arXiv:1806.04562, 2018.
  • [29] S. Mahadevan and J. Connell, “Automatic programming of behavior-based robots using reinforcement learning,” Artificial Intelligence, vol. 55, no. 2–3, pp. 311–365, 1992.
  • [30] S. Schaal, “Learning from demonstration,” in Advance in Neural Information Processing Systems, 1997, pp. 1040–1046.
  • [31] A. Y. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte, B. Tse, E. Berger, and E. Liang, “Autonomous inverted helicopter flight via reinforcement learning,” Experimental Robotics IX, pp. 363–372, 2006.
  • [32] M. Riedmiller, T. Gabel, R. Hafner, and S. Lange, “Reinforcement learning for robot soccer,” Journal of Autonomous Robots, vol. 27, no. 1, pp. 55–73, 2009.
  • [33] K. Mülling, J. Kober, O. Kroemer, and J. Peters, “Learning to select and generalize striking movements in robot table tennis,” International Journal of Robotics Research, vol. 32, no. 3, pp. 263–279, 2013.
  • [34] Z. Du, W. Wang, Z. Yan, W. Dong, and W. Wang, “Variable admittance control based on fuzzy reinforcement learning for minimally invasive surgery manipulator,” Sensors, vol. 17, no. 4, 2017.
  • [35] B. Yu, A. T. Tibebu, D. Stoyanov, S. Giannarou, J. H. Metzen, and E. Vander Poorten, “Surgical robotics beyond enhanced dexterity instrumentation: a survey of machine learning techniques and their role in intelligent and autonomous surgical actions,” International Journal of Computer Assisted Radiology and Surgery, vol. 11, no. 4, pp.553–568, 2016.
  • [36] J. Chen, H. Y. Lau, W. Xu, and H. Ren, “Towards transferring skills to flexible surgical robots with programming by demonstration and reinforcement learning,” in International Conference on Advanced Computational Intelligence (ICACI), pp. 378–384, 2016.
  • [37] A. Billard, S. Calinon, R. Dillmann, and S. Schaal, “Robot programming by demonstration,” Springer handbook of robotics, pp. 1371–1394, 2008.
  • [38] V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
  • [39] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA: The MIT Press, 2012.
  • [40] Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel, “Benchmarking deep reinforcement learning for continuous control,” in International Conference on Machine Learning, pp. 1329–1338, 2016.