I Introduction
In robotic surgery, manipulation of a deformable sheet, especially cutting through a predefined contour trajectory, is a critical task that has attracted a significant number of research interests [1, 2, 3, 4, 5]. The pattern cutting task is one of the Fundamental Skills of Robotic Surgery (FSRS) because it minimizes surgeon errors, operation time, trauma, and expenses [6, 7, 8]. Furthermore, the deformable material is usually soft and elastic, which is intractable to perform a cutting procedure accurately [9]. Therefore, it is necessary to use a gripper [10, 11], which holds a point (pinch point) on the sheet and tensions it along an allowable set of directions with a reasonable force while a surgical scissor is used to cut through a contour trajectory, as shown in Fig. 1.
In other words, the pattern cutting task involves two essential steps: 1) selecting a pinch point and 2) finding a tensioning policy from that pinch point. Previous study considers a single pinch point over the course of cutting, which is only efficient when the contour shape is simple. Conversely, it is more appropriate to divide a complicated contour into different segments. In this case, the use of one pinch point is unsafe and significantly reduces cutting accuracy [12].
In this paper, we examine a multipoint approach and compare the accuracy with its counterpart. Because the robot has a single gripper, only one pinch point is used for tensioning and the others are pinned permanently in the setup phase. Finally, we use a deep reinforcement learning algorithm, namely Trust Region Policy Optimization (TRPO) [13], as in [12], to seek the optimal tensioning policy from a pinch point. The tensioning policy determines the tensioning direction based on the current state of the sheet, contour information, and cutting position.
In this study, we use the simulator described in [12] to evaluate the accuracy and reliability of the proposed approach and compare its performance with the stateoftheart method [12]. For the sake of conciseness, we call the method described in [12] the Singlepoint Deep Reinforcement Learning Tensioning (SDRLT) method. Furthermore, the simulator has a practical perspective because the learned tensioning policy is reevaluated by the wellknown physical surgical system, da Vinci Research Kit (dVRK) [14]. Simulation results show that the multipoint approach outperforms the SDRLT method and achieves an average of 55% better accuracy than the nontensioned baseline over a set of 14 multisegment contours.
Finally, the paper has the following contributions:

This work provides the first study of using multiple pinch points in pattern cutting task. The study shows the benefits of using multiple pinch points, particularly in complicated contours where a scissor is instructed to cut multiple segments to complete the whole contour trajectory.

The proposed scheme outperforms the stateoftheart method in pattern cutting task and becomes a premise to develop a significant number of research extensions such as the use of multiple grippers, multiple scissors, and multilayer pattern cutting in a 3D environment.

The multipoint approach imitates human demonstrations while cutting. Therefore, the proposed scheme is useful in both practical implication and theoretical analysis. Finally, the proposed scheme achieves high accuracy and reliability in pattern cutting task.
The rest of the paper is organized as follows. The next section reviews recent advances in surgical automation and the benefits of using reinforcement learning in surgical tasks. Section III presents a preliminary background of pattern cutting task and introduces our proposed scheme. Section IV discusses the experimental results of the proposed scheme and Section V concludes the paper.
Ii Related Work
The use of robotassisted surgery allows doctors to automate a complicated task with minimal errors and effort. A number of automation levels have been discussed extensively in the literature [15, 16, 17, 18, 19, 20, 21]. Specifically, early approaches required the existence of experts to create a model trajectory that is used to teach a robot to automatically complete a designated task. For example, Schulman et al. [22] use a trajectory transfer algorithm to transform human demonstrations into model trajectories. These trajectories are updated to adapt to new environment geometry and hence assist a robot in learning the task of suturing. Recently, Osa et al. [23] propose a framework for online trajectory planning. The framework uses a statistical method to model the distribution of demonstrated trajectories. As a result, it is possible to infer the trajectories into a dynamic environment based on the conditional distribution.
Reinforcement learning (RL) has become a promising approach to modeling an autonomous agent [24, 25, 26, 27, 28]. RL has the abilities to mimic human learning behaviors to maximize the longterm reward. As a result, RL enables a robot to learn on its own and partially eliminates the existence of experts. Examples of these agents are the boxpushing robot [29], polebalancing humanoid [30], helicopter control [31], soccerplaying agent [32], and table tennis playing agent [33]. Furthermore, RL has been utilized in a significant number of research interests in surgical tasks [34, 35]. For example, Chen et al. [36] propose a datadriven workflow that combines Programming by Demonstration [37] with RL. The workflow encodes the inverse kinematics using trajectories from human demonstrations. Finally, RL is used to minimize the noise and adapt the learned inverse kinematics to the online environment.
Recent breakthrough [38]
combines neural networks with RL (deep RL), which enables traditional RL methods to work properly in highdimensional environments. Typically, Thananjeyan
et al. [12] use a deep RL algorithm (TRPO) to learn the tensioning policy from a pinch point in pattern cutting task. The tensioning problem is described as a Markov Decision Process [39] where each action moves the gripper 1mm along one of four directions in the 2D space. A state of the environment is a state of the deformable sheet, which is represented by a rectangular mesh of point masses. The goal is to minimize the symmetric difference between the ideal contour and the achieved contour cut. This approach, however, uses a single pinch point to complete the pattern cutting task. In this paper, we examine the use of multiple pinch points over the course of cutting. Finally, we compare the cutting accuracy of our proposed scheme with two baseline methods described in [12]: 1) the nontensioned scheme and 2) SDRLT. Section III and Section IV present the proposed scheme in more details.Iii Proposed Scheme
Iiia Preliminary
As mentioned earlier, the tensioning problem is described as a Markov Decision Process where a state of the environment is represented by a mesh of point masses. Initially, these points are aligned evenly in the horizontal direction by a distance and in the vertical direction by a distance (in the 2D space), as illustrated in Fig. 2. A set of points is indexed by . We define , where denotes a position of point in 3D space. Initially, we assume that . Therefore, a state of the environment at time is represented by .
To simulate the deformable material properly, each point is connected to its neighbors by a spring force. This force maintains the distance between neighboring points. A point is moved to a cut set if it does not have any constraints with its neighbors, i.e., the point is cut by a scissor. When we apply an external force to tension a pinch point, the positions of its neighbors can be calculated by using the Hooke’s law [12]:
where and represent timeconstant parameters, represents a spring constant, and
denotes gravity. The vector of gravity
belongs to the zaxis. For simplicity, we assume . Let be a set of actions, we can define a tensioning policy as a mapping function fromto probability distribution of
, i.e., .IiiB MultiPoint Deep RL Tensioning Method
A robot is normally limited by spatial and mechanical constraints. Therefore, it is intractable to use a scissor to cut a complicated contour without interruptions. One solution is to divide the contour into multiple segments and find the cutting order among these segments to minimize the damage to the material [12]. However, the use of one pinch point makes it impossible to avoid any damage to the material, especially near the joint areas between segments. In Fig. 3, for example, the contour is divided into two segments: segment 1 is illustrated by the orange dots and segment 2 is illustrated by the dark blue dots. The pinch point is represented by a solid green dot. The gray areas denote the joint areas between the segments. After segment 1 is completely cut, a tensioning force applied to the pinch point inadvertently causes the joint areas to distort, i.e., we start cutting segment 2 from an improper position. This drastically reduces the cutting accuracy. To overcome these obstacles, we add a fixed pinch point in each joint area to avoid distortion. This approach is feasible as it is done in the setup phase, which can be arranged before the cutting process.
To further increase the efficiency by using pinch points, we proceed a divide and conquer approach. In other words, if the contour is divided into segments, we find a set of different pinch points. Because we have one gripper, only one pinch point is used for tensioning. We also assume that the gripper while moving to a different pinch point does not affect the deformable sheet. This assumption is reasonable in surgical tasks where the deformable material is not too soft. In Fig. 4, for example, we are cutting segment by using a pinch point . After the segment is cut, the gripper selects a pinch point to start cutting segment . This approach indicates that we need to find the best pinch point in each segment, which we call this process the local search. Previous work [12] finds the best pinch point for all the segments, which is an intractable task.
A local search process for segment involves two steps: 1) finding a set of candidate pinch points for segment and 2) selecting the best pinch point among the candidates. Fig. 5 describes the process of finding a set of candidate pinch points for a specific segment . Initially, we define a distance threshold , and then we find a set of candidate points around the segment based on . Specifically, we select a point as a candidate point if it satisfies the following equation:
where denotes the distance between two points and , and is a point in the segment . After this step, we have a set of candidates . We take candidates randomly from and put them in an empty set . After that, we remove all candidates that are direct neighbors in to form a set , as shown in Fig. 5. The next step is to use the TRPO algorithm to create a tensioning policy for each candidate in .
Fig. 6 summarizes the workflow of our Multipoint Deep RL Tensioning method (MDRLT) as follows:

Problem definition: A complex contour is defined in the deformable material.

Setup phase: This phase involves dividing the contour into multiple segments, finding cutting order between these segments, and finally adding fixed pinch points in joint areas.

Local search: As described earlier, the goal of this phase is to find a set of candidate tensioning pinch points in each segment.

Evaluation phase: This phase combines each candidate pinch point in each segment to evaluate the accuracy while cutting the whole contour.

Final phase: The best combination of candidate pinch points is selected together with fixed pinch points in joint areas. This phase terminates our algorithm.
Iv Performance Evaluation
Iva Simulation settings
In this section, we use the simulator described in [12] with the following parameter settings: , , , and . The threshold equals to 100. The maximum number of candidate pinch points in each segment is if the number of segments equals to 2 and if the number of segments is greater than 2. The algorithms to divide the contour into different segments are based on the mechanical constraints of the dVRK and developed in the simulator. To find the best cutting order among different segments, we use the exhaustive search to find the order that provides the highest accuracy. Each tensioning policy is trained with TRPO in 20 iterations, a batch size of 500, a step size of 0.01, and a discount factor of 1. We use the implementation of the TRPO algorithm as in [40]
. The cutting accuracy is also defined in the simulator, which is the symmetric difference between the ideal contour with the actual contour cut. The cutting reliability is measured by calculating the standard deviation while evaluating cutting accuracy. Finally, the simulator is significantly modified to support local search.
IvB Accuracy performance
Contour  Algorithm  Eval 1  Eval 2  Eval 3  Eval 4  Eval 5  Eval 6  Eval 7  Eval 8  Eval 9  Eval 10  Mean 

Figure A  MDRLT1  29  31  46  34  50  31  40  38  40  39  37.8 
MDRLT2  24  37  35  32  33  45  37  30  37  27  33.7  
Figure B  MDRLT1  33  67  55  53  50  60  42  36  33  54  48.3 
MDRLT2  27  39  42  79  27  26  23  25  38  28  35.4  
Figure C  MDRLT1  36  44  44  42  47  34  50  43  45  53  43.8 
MDRLT2  39  30  26  34  26  35  33  32  39  33  32.7  
Figure D  MDRLT1  144  137  130  139  136  136  137  129  129  133  135 
MDRLT2  40  34  42  36  45  32  36  44  39  25  37.3  
Figure E  MDRLT1  65  65  59  55  64  52  51  74  62  56  60.3 
MDRLT2  38  44  34  51  34  39  35  36  36  36  38.3  
Figure F  MDRLT1  58  70  49  44  42  53  53  33  37  35  47.4 
MDRLT2  36  38  34  28  35  28  25  28  40  27  31.9  
Figure G  MDRLT1  38  41  36  44  45  33  48  48  38  42  41.3 
MDRLT2  19  25  22  23  22  22  26  25  29  24  23.7  
Figure H  MDRLT1  43  18  16  46  39  28  25  23  28  18  28.4 
MDRLT2  17  17  20  20  13  24  17  16  17  21  18.2  
Figure I  MDRLT1  78  62  73  62  51  55  51  74  76  77  65.9 
MDRLT2  37  40  42  44  38  43  39  52  39  46  42  
Figure J  MDRLT1  19  18  13  12  20  19  21  18  19  19  17.8 
MDRLT2  25  26  21  22  21  21  21  21  21  21  22  
Figure K  MDRLT1  82  104  79  87  90  89  76  78  74  59  81.8 
MDRLT2  48  47  39  45  56  57  49  81  42  43  50.7  
Figure L  MDRLT1  35  31  37  32  41  32  35  32  33  33  34.1 
MDRLT2  14  15  18  12  12  15  15  14  15  20  15  
Figure M  MDRLT1  21  24  20  21  21  25  23  18  17  23  21.3 
MDRLT2  14  15  14  13  19  15  14  17  17  14  15.2  
Figure N  MDRLT1  38  34  102  25  34  26  28  32  18  56  39.3 
MDRLT2  23  19  20  27  18  26  26  30  23  23  23.5 
To compare the cutting accuracy between different algorithms, we select 14 complicated multisegment contours described in [12], as shown in Fig. 7. The black dots represent the fixed pinch points that are used in the setup phase. The red dots represent the best tensioning pinch points found by the local search. We compare the cutting accuracy and reliability between six algorithms (the first four algorithms are based in [12]):

NonTensioned Baseline (NTB): We only use the scissor to cut the contour without using the gripper.

Single Fixed Pinch point without tensioning (SFP): A single fixed pinch point is used without tensioning.

Single Tensioning Pinch point (STP): A single tensioning pinch point is used.

SDRLT: A single tensioning pinch point is used. We use the TRPO to find the tensioning policy for the pinch point.

MDRLT1: The proposed algorithm without using the fixed pinch points in joint areas.

MDRLT2: The proposed algorithm using both fixed pinch points and tensioning pinch points.
We evaluate the proposed algorithms (MDRLT1 and MDRLT2) in 10 simulated trials for each figure in the testbed. Table I presents the raw values of the symmetric difference between the ideal contour and the actual contour cut in this evaluation. Fig. 8 shows the mean of relative percentage improvement in symmetric difference over the NTB method of five different algorithms. We see that the use of fixed pinch points during the setup phase determines the cutting accuracy. Therefore, MDRLT1 is not better than SDRLT but MDRLT2 significantly outperforms SDRLT, which is the stateoftheart method in surgical pattern cutting.
Finally, Fig. 9 shows the absolute error while evaluating the cutting accuracy in 10 trials. This metrics represents the reliability of the proposed methods. Among three algorithms using deep RL, MDRLT2 provides the highest reliability as it has the lowest value of absolute error.
V Conclusion
This paper introduces an interesting multipoint approach based on deep reinforcement learning for the surgical soft tissue cutting task that is meaningful in both practical perspective and theoretical analysis. In the theoretical analysis, the paper benchmarks the accuracy of the use of multiple pinch points during the course of cutting, which is the first study according to our best knowledge. The study also concludes that the use of fixed pinch points in joint areas is the key to significantly outperform the stateoftheart cutting method with respect to accuracy and reliability.
The proposed approach becomes a normative workflow to ensure the safety in the surgical pattern cutting task. Moreover, it can be applied to a diversity of future research such as the use of multiple grippers or multiple scissors in surgical tasks, multilayer pattern cutting in 3D space, or 3D multisegment contours.
References
 [1] A. Murali, S. Sen, B. Kehoe, A. Garg, S. McFarland, S. Patil, W. D. Boyd, S. Lim, P. Abbeel, and K. Goldberg, “Learning by observation for surgical subtasks: multilateral cutting of 3d viscoelastic and 2d orthotropic tissue phantoms,” in International Conference on Robotics and Automation (ICRA), pp. 1202–1209, 2015.
 [2] H.W. Nienhuys and A. F. Van der Stappen, “A surgery simulation supporting cuts and finite element deformation,” in International Conference on Medical Image Computing & Computer Assisted Intervention, 2001.
 [3] K. A. Nichols and A. M. Okamura, “Methods to segment hard inclusions in soft tissue during autonomous robotic palpation,” IEEE Transactions on Robotics, vol. 31, no. 2, pp. 344–354, 2015.
 [4] N. Haouchine, S. Cotin, I. Peterlik, J. Dequidt, M. S. Lopez, E. Kerrien, and M. O. Berger, “Impact of soft tissue heterogeneity on augmented reality for liver surgery,” IEEE Transactions on Visualization & Computer Graphics, vol. 1, 2015.
 [5] T. T. Nguyen, N. D. Nguyen, F. Bello, and S. Nahavandi, “A New Tensioning Method using Deep Reinforcement Learning for Surgical Pattern Cutting,” arXiv preprint arXiv:1901.03327, 2019.
 [6] R. A. Fisher, P. Dasgupta, A. Mottrie, A. Volpe, M. S. Khan, B. Challacombe, and K. Ahmed, “An overview of robot assisted surgery curricula and the status of their validation,” International Journal of Surgery, vol. 13, pp. 115–123, 2015.
 [7] G. Dulan, R. V. Rege, D. C. Hogg, K. M. GilbergFisher, N. A. Arain, S. T. Tesfay, and D. J. Scott, “Developing a comprehensive, proficiencybased training program for robotic surgery,” Surgery, vol. 152, no. 3, pp. 477–488, 2012.
 [8] A. P. Stegemann, K. Ahmed, J. R. Syed, S. Rehman, K. Ghani, R. Autorino, et al., “Fundamental skills of robotic surgery: a multiinstitutional randomized controlled trial for validation of a simulationbased curriculum,” Urology, vol. 81, no. 4, pp. 767–774, 2013.
 [9] A. Shademan, R. S. Decker, J. D. Opfermann, S. Leonard, A. Krieger, and P. C. Kim, “Supervised autonomous robotic soft tissue surgery,” Science Translational Medicine, vol. 8, no. 337, 2016.
 [10] K. Tai, A. R. ElSayed, M. Shahriari, M. Biglarbegian, and S. Mahmud, “State of the art robotic grippers and applications,” Robotics, vol. 5, no. 2, 2016.
 [11] G. Rateni, M. Cianchetti, G. Ciuti, A. Menciassi, and C. Laschi, “Design and development of a soft robotic gripper for manipulation in minimally invasive surgery: a proof of concept,” Meccanica, vol. 50, no. 11, pp. 2855–2863, 2015.
 [12] B. Thananjeyan, A. Garg, S. Krishnan, C. Chen, L. Miller, and K. Goldberg, “Multilateral surgical pattern cutting in 2D orthotropic gauze with deep reinforcement learning policies for tensioning,” in International Conference on Robotics and Automation (ICRA), pp. 2371–2378, 2017.

[13]
J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in
International Conference on Machine Learning
, pp. 1889–1897, 2015.  [14] P. Kazanzides, Z. Chen, A. Deguet, G. S. Fischer, R. H. Taylor, and S. P. DiMaio, “An opensource research kit for the da Vinci Surgical System,” in International Conference on Robotics and Automation (ICRA), pp. 6434–6439, 2014.
 [15] N. R. Crawford, S. Cicchini, and N. Johnson, “Surgical robotic automation with tracking markers, U.S. Patent Application No. 15/609,334, 2017.
 [16] G. P. Moustris, S. C. Hiridis, K. M. Deliparaschos, and K. M. Konstantinidis, “Evolution of autonomous and semi‐autonomous robotic surgical systems: a review of the literature,” The International Journal of Medical Robotics and Computer Assisted Surgery, vo. 7, no. 4, pp. 375–392.
 [17] J. Van Den Berg, S. Miller, D. Duckworth, H. Hu, A. Wan, X. Y. Fu, et al., “Superhuman performance of surgical tasks by robots using iterative learning from humanguided demonstrations,” in International Conference on Robotics and Automation (ICRA), pp. 2074–2081, 2010.
 [18] J. M. Prendergast and M. E. Rentschler, “Towards autonomous motion control in minimally invasive robotic surgery,” Expert Review of Medical Devices, vol. 13, no. 8, pp. 741–748, 2016.
 [19] D. T. Nguyen, C. Song, Z. Qian, S. V. Krishnamurthy, E. J. Colbert, and P. McDaniel, “IoTSan: fortifying the safety of IoT Systems,” arXiv preprint arXiv:1810.09551, 2018.
 [20] C. Staub, T. Osa, A. Knoll, and R. Bauernschmitt, “Automation of tissue piercing using circular needles and vision guidance for computer aided laparoscopic surgery,” in International Conference on Robotics and Automation (ICRA), pp. 4585–4590, 2010.
 [21] S. Sen, A. Garg, D. V. Gealy, S. McKinley, Y. Jen, and K. Goldberg, “Automating multithrow multilateral surgical suturing with a mechanical needle guide and sequential convex optimization,” in International Conference on Robotics and Automation (ICRA), pp. 4178–4185, 2016.
 [22] J. Schulman, A. Gupta, S. Venkatesan, M. TaysonFrederick, and P. Abbeel, “A case study of trajectory transfer through nonrigid registration for a simplified suturing scenario,” in International Conference on Intelligent Robots and Systems (IROS), pp. 4111–4117, 2013.
 [23] T. Osa, N. Sugita, and M. Mitsuishi, “Online trajectory planning and force control for automation of surgical tasks,” IEEE Transactions on Automation Science and Engineering, vol. 15, no. 2, pp. 675–691, 2018.
 [24] T. T. Nguyen, N. D. Nguyen, and S. Nahavandi, “Deep Reinforcement Learning for MultiAgent Systems: A Review of Challenges, Solutions and Applications,” arXiv preprint arXiv:1812.11794, 2019.
 [25] T. T. Nguyen, “A MultiObjective Deep Reinforcement Learning Framework,” arXiv preprint arXiv:1803.02965, 2018.
 [26] N. D. Nguyen, T. Nguyen, and S. Nahavandi, “System design perspective for humanlevel agents using deep reinforcement learning: A survey. IEEE Access, vol. 5, pp. 2709127102, 2017.
 [27] N. D. Nguyen, S. Nahavandi, and T. Nguyen, “A human mixed strategy approach to deep reinforcement learning,” arXiv preprint arXiv:1804.01874, 2017.
 [28] T. Nguyen, N. D. Nguyen, and S. Nahavandi, “Multiagent deep reinforcement learning with human strategies,” arXiv preprint arXiv:1806.04562, 2018.
 [29] S. Mahadevan and J. Connell, “Automatic programming of behaviorbased robots using reinforcement learning,” Artificial Intelligence, vol. 55, no. 2–3, pp. 311–365, 1992.
 [30] S. Schaal, “Learning from demonstration,” in Advance in Neural Information Processing Systems, 1997, pp. 1040–1046.
 [31] A. Y. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte, B. Tse, E. Berger, and E. Liang, “Autonomous inverted helicopter flight via reinforcement learning,” Experimental Robotics IX, pp. 363–372, 2006.
 [32] M. Riedmiller, T. Gabel, R. Hafner, and S. Lange, “Reinforcement learning for robot soccer,” Journal of Autonomous Robots, vol. 27, no. 1, pp. 55–73, 2009.
 [33] K. Mülling, J. Kober, O. Kroemer, and J. Peters, “Learning to select and generalize striking movements in robot table tennis,” International Journal of Robotics Research, vol. 32, no. 3, pp. 263–279, 2013.
 [34] Z. Du, W. Wang, Z. Yan, W. Dong, and W. Wang, “Variable admittance control based on fuzzy reinforcement learning for minimally invasive surgery manipulator,” Sensors, vol. 17, no. 4, 2017.
 [35] B. Yu, A. T. Tibebu, D. Stoyanov, S. Giannarou, J. H. Metzen, and E. Vander Poorten, “Surgical robotics beyond enhanced dexterity instrumentation: a survey of machine learning techniques and their role in intelligent and autonomous surgical actions,” International Journal of Computer Assisted Radiology and Surgery, vol. 11, no. 4, pp.553–568, 2016.
 [36] J. Chen, H. Y. Lau, W. Xu, and H. Ren, “Towards transferring skills to flexible surgical robots with programming by demonstration and reinforcement learning,” in International Conference on Advanced Computational Intelligence (ICACI), pp. 378–384, 2016.
 [37] A. Billard, S. Calinon, R. Dillmann, and S. Schaal, “Robot programming by demonstration,” Springer handbook of robotics, pp. 1371–1394, 2008.
 [38] V. Mnih et al., “Humanlevel control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
 [39] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA: The MIT Press, 2012.
 [40] Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel, “Benchmarking deep reinforcement learning for continuous control,” in International Conference on Machine Learning, pp. 1329–1338, 2016.