Manipulation of soft tissues is among the problems of interests of many researchers in the surgical robotics field [1-3]. Surgical scissors are an efficient tool that is normally used to cut through soft tissues . For deformable substances, the deformation behaviours are highly nonlinear and thus present challenges for manipulation and precise cutting . Other surgical tools, e.g. robotic grippers [6, 7], are needed to pinch and tension the soft tissues to facilitate the cutting. The tensioning direction and force need to be adjusted adaptively when cutting proceeds through a predefined trajectory [8, 9].
Surgical pattern cutting skill is one of the requirements for surgical residents, as listed in the Fundamentals of Laparoscopic Surgery (FLS) training suite  and for robotic surgery, as included in the Fundamental Skills of Robotic Surgery (FSRS) [11-13]. Automation of surgical tasks can be helpful as it mitigates surgeon load and errors, reduces time, trauma and expenses. Different levels of surgical automation have been studied broadly in the literature [14-19].
Specifically, Shamaei et al.  introduced a teleoperated architecture that facilitates the cooperation between human surgeon and autonomous robot to execute complicated laparoscopic surgical tasks. Human surgeon can supervise and intervene the slave robot any time during the operation of surgical tasks. Findings from that study showed the reduction of surgical time when having the collaboration between human and robot compared with performance of a human operator alone. Osa et al.  introduced a framework to address two problems of surgical automation in robotic surgery, including the online trajectory planning and the dynamic force control. By learning both spatial motion and contact force simultaneously through leveraging demonstrations, the framework is able to plan trajectory and control force in real time. Experiments with cutting soft tissue and tying knots showed the robustness and stability of the framework under dynamic conditions.
Machine learning in general or reinforcement learning (RL) [22, 23] in particular has been involved in a number of studies for automation of surgical tasks [24, 25]. The ability of RL to solve sequential decision-making problems makes it suitable for automating complicated tasks . Recent development of deep learning [27-29] has made RL as a robust tool to deal with high-dimensional problems . Chen et al.  combined programming by demonstration and RL for motion control of flexible manipulators in minimally invasive surgical performance. Experiments on tube insertion in 2D space and circle following in 3D space showed the effectiveness of the RL-based model. Recently, Baek et al.  proposed the use of probabilistic roadmap and RL for optimal path planning in dynamic environment. The method was able to perform resection automation of cholecystectomy by planning a path that avoids collisions in a laparoscopic surgical robot system.
Notably, Thananjeyan et al.  investigated a method to learn the tensioning force using deep reinforcement learning (DRL), namely trust region policy optimization (TRPO)  for soft tissue cutting . The tensioning problem is modelled as a Markov decision process where action set includes 1mm movements of the tensioning arm in the 2D space. The proposed method is evaluated using a simulator that models a planar deformable sheet as a rectangular mesh of point masses . The performance of the proposed method and its competing models is measured by computing the symmetric difference between the predefined pattern and actual cut. The performance obtained from the experiments on multilateral surgical pattern cutting in 2D orthotropic gauze is superior to those of conventional models, e.g. fixed tensioning and analytic tensioning. The method automatically learns the tensioning policy by choosing a fixed pinch point during the entire cutting process regardless of the cutting pattern complexity. This approach has a disadvantage when the cutting pattern is complex as it requires multiple pinch points as cutting proceeds.
In this study, we propose a multiple pinch point approach based on DRL to learn the tensioning policy effectively for surgical gauze cutting. Through this paper, we will analyse and highlight the advantages of our approach compared to existing methods, i.e. no-tension, fixed, analytic, and single pinch point DRL . To facilitate the unbiased comparisons, we use the same simulator as in  to model the deformable surgical gauze for experiments. The next section describes in detail the deformable sheet simulator. Section 3 presents the proposed multiple pinch point approach to tensioning policy learning using DRL. Experimental results and discussions are presented in Section 4, followed by conclusions and future work in Section 5.
2 Deformable Sheet Simulator
We use a finite-element simulator to test the proposed algorithm and its competing methods, as with Thananjeyan et al. . The deformable gauze is modelled as a square planar sheet of mesh points whose locations comprise the state of the gauze. When the gauze is tensioned at a pinch point, the mesh points are moved and therefore their coordinates are observed as a new state of the sheet.
The sheet is initialized with equally spaced mesh points, defined as , and locations of these points are denoted as in the global coordinate frame, . Thus, the state of the sheet at time point t is defined as . Movements of the tensioning arm are assumed to be within the plane, and therefore the initial state of the sheet . The state is then repeatedly updated through the simulation as:
To simulate the state of the sheet when tensioning, the location of each mesh point is updated at each time step :
where and are time-constant parameters that specify the rate at which the sheet reacts to an applied external force , is a spring constant, is a model that characterizes the interactions between vertices . Cutting is modelled as removing the vertices on the trajectory from the mesh so that these vertices no longer affect their neighbors.
For a pinch point , the tensioning problem is specified as constraining location of this pinch point to a position . The tensioning policy is defined as:
Fig. 1 exhibits the simulation sheets with four different tensioning directions and the reaction of mesh points on the tensioning. Tensioning is required to be adaptive to the deformation of the sheet at each time step of cutting process. In this study, we present a multiple pinch point tensioning approach where the cutting contour is segmented, and each pinch point is used for an individual segment to improve the cutting accuracy.
3 Multiple Pinch Points DRL Tensioning Method
Cutting with scissors can only proceed in the pointing direction of the scissors. In some cases, scissors are required to be rotated all 360 degrees to complete a complex contour in a single cut. This may not be possible in cases of robotic surgery where cutting arms are constrained to a rotation limit. Therefore, the cutting contours are normally broken into several segments with each segment is cut with a different starting point, namely a notch point (Fig. 2).
We first divide the entire cutting trajectory into several segments and choose a suitable pinch point for each segment. The final number of pinch points is therefore equal to the number of segments. We then implement the TRPO algorithm to learn the tensioning policy for each segment based on the corresponding candidate pinch points. Once this step is complete for every segment, we deploy a final aggregate learning step to systematically choose the best pinch point and its corresponding policy for each segment. This is to ensure the whole contour is cut continuously with a smooth transition between segments. This is important because tensioning causes the movement of vertices of the mesh and when each segment is treated separately, the termination point of the previously cut segment need to be matched with a notch point of the next segment. Our approach is diagrammed in detail in Fig. 3.
3.1 Dynamic Multiple Pinch Point Selection
Selection of multiple pinch points for multiple segments for tensioning consists of four continuous steps outlined as follows.
Step 1: Select candidate pinch points – We first divide the contour into N different segments based on local minima and maxima using directional derivatives. We then list all candidate pinch points satisfying the condition that two arms of the robot (cutting and pinch arms) are not conflicting. These candidate pinch points are grouped into N groups (i=1,2,…,N) (where N is the number of segments) with the following constraint. Given a trajectory of the i–th segment of the contour, a pinch point p in can serve as a candidate for if:
where denotes the Euclidean distance between points a and b, is a point belongs to segment , and denotes a distance threshold. In the simulation, we select =50 if a contour has more than 2 segments and =100 if the contour has no more than 2 segments.
Step 2: Pruning pinch points – To improve the quality of selected pinch points, we prune the redundancy pinch points as follows:
All direct neighbors of a pinch point are removed.
We randomly select 30 pinch points for each segment (10 pinch points if the number of segments is greater than 2). This approach is more robust than the previous study  as we only select quality pinch points. It means that we select pinch points that are close to the contour and we prevent multiple pinch points from distributing in the same local area by pruning neighbor pinch points. Therefore, the selected pinch points are uniformly distributed in all areas that are closest to the contour.
Step 3: Local training – Train a tensioning policy for each selected pinch point in each segment. This is only a local training, i.e., the policy is trained for cutting only one individual segment. We modify the simulator so that the training is conducted within a designated segment instead of the whole contour in the previous work . This approach significantly reduces the total training time for all candidate pinch points.
Step 4: Find optimal set of pinch points – Find the best order of segments so that the cutting error is minimum. Given a set of segments, there is an optimal order of cutting that minimizes the cutting error. We find the best order of segments by using brute-force search . We go through all segment permutations and perform a cut for each permutation. This is reasonable as the number of segments is normally limited. We assume there is no selected pinch point during the cutting process. The best order will be selected.
Using that order, go through all possible permutations of candidate pinch points to perform the experiment (cutting the entire contour), and select the permutation that provides the best score. We separate the training and evaluate into two steps. This significantly reduces the total time to find the optimal pinch points because it is impossible to train all permutations of selecting pinch points. Therefore, this approach is practical in real-world application. For each pinch point, we have two possible actions: fixed or tensioning. In this study, to reduce the number of configurations, we only apply DRL tensioning for the last pinch point in a permutation while other pinch points are kept with the fixed action. For example, in a set of 4 pinch points (for a contour of 4 segments) of a permutation, are fixed pinch points, while uses a DRL tensioning policy.
3.2 Tensioning Policy Learning with DRL
For unbiased comparisons with existing methods, we propose the use of TRPO algorithm to learn a policy for the tensioning arm. The goal of learning is to minimize the cutting error between the desired contour and the actual cut trajectory. The tensioning problem is modelled as a Markov decision process:
where is the state space, is 1mm movement actions of the tensioning arm in the x and y directions, is unknown dynamics model, is the reward structure and is the fixed time horizon. The robot is given zero reward at all time steps except the last step where it receives a reward equivalent to symmetric difference between the marked contour and the actual contour. The reinforcement learning policy is learned to optimize the expected reward :
We use the TRPO implementation in the rllab framework  to optimize
. The state space is configured as a vector combining the time index of the trajectory, the location of fiducial points selected randomly on the sheet and the displacement vector from the original pinch point . The number of fiducial points chosen is 12 for our experiments. This vector is assumed to represent the state of the sheet at any time point
4 Simulation Results and Discussions
4.1 Cutting Accuracy
We use 17 contour shapes with different complexity levels as in  for cutting experiments for ease of comparisons. Cutting accuracy is evaluated based on the symmetric difference between the actual cut contour and the desired contour. We compare our method with four existing methods, including no-tensioning, fixed, analytic and single pinch point DRL . Descriptions of these methods are below:
No-Tensioning: Cutting proceeds with no assisted tension. The gauze is only suspended with clips at four corners while being cut.
Fixed Tensioning: The gauze is pinched at a fixed point when cutting proceeds through the entire contour.
Analytic Tensioning: The tension is planned based on the direction and magnitude of the error of the cutting tool and the nearest point on the marked contour.
Single Pinch Point DRL: A tensioning policy is learned based on DRL uses a single pinch point for the entire contour. This method was proposed in .
Multiple Pinch Point DRL (MDRL): The contour is broken into several segments and each segment is cut using a tensioning policy with a corresponding pinch point. This method is described in detail in Section 3. The total time to find the optimal policy using MDRL is reasonable and can be controlled by the parameter , which is the distance threshold in Eq. (4). Depending on the number of segments, number of selected pinch points, and the value of , the whole process takes 6 to 15 hours to find the optimal set of pinch points. Because we separate the training and evaluation into two steps, it is possible to accelerate the process by scheduling them running in parallel.
We learn a policy by 20 iterations with a batch size of 500 for each contour shape. After training, each shape is tested 20 times with the learned policy and the results in terms of average accuracy are reported. The simulation results of five competing tensioning methods on 17 contours are presented in Table 1. The red dots on each shape represent the optimal pinch points chosen by our algorithm. Each segment has a corresponding pinch point, which is different to the single pinch point method where the entire contour has only one pinch point. The no-tensioning method is chosen as a baseline where its results are presented in terms of symmetric difference between marked contour and actual cut contour. Results of other methods are reported as the improvement percentage against this baseline method. The proposed MDRL outperforms all existing methods in terms of average accuracy. It achieves the improvement of 50.6% against the baseline method whilst the single pinch point DRL obtains the improvement of 43.3%. The standard deviation of the MDRL method at 4.1 is smaller than that of the DRL method at 6.3. This demonstrates that the proposed multiple pinch point method is more stable than the single pinch point DRL method. The last four shapes demonstrate the equivalent performance between single pinch point DRL and multiple pinch point DRL because there is only one segment is created for each of these contours. Therefore, there is only one single pinch point is generated for the entire contour and thus our proposed method is diminished to the single pinch point method.
Fig. 4 shows the graphical comparisons of four competing methods where MDRL dominates all other methods in terms of average performance. On average, the analytic method is the worst performer while the single pinch point DRL method is the most unstable method as its standard deviation is the greatest among the competing methods.
4.2 Robustness Testing
Learning an autonomous policy based on DRL using a simulation environment would be vulnerable to overfitting. Thananjeyan et al.  checked the robustness of the single pinch point DRL tensioning method on different resolutions of the simulated sheet, which were characterized by the number of vertices representing the sheet. The test results on six different resolutions from 400 to 4000 vertices showed that policies on low resolutions can still yield good performance and suggested the use of 625 (25x25) vertices to obtain the greatest cutting accuracy. Therefore, in this study, we use this resolution setting, i.e. 25x25 vertices, to evaluate our algorithm and test the robustness of the competing methods on varying process noise and gravity force.
Different noise levels are added into the mesh point update formula, i.e. Eq. (2), with the Gaussian noise is used for simulation. We run 10 trials for each of 10 values of . The effect of noise on performance score of different methods applied on shape 11 is presented in Fig. 5. It is shown that the MDRL is superior to other methods in terms of improvement over the no-tensioning method. This dominance demonstrates the robustness of the MDRL against the process noise injected into the simulation environment.
We use to learn tensioning policies for the single pinch point DRL and MDRL methods. All competing methods are then tested on different force magnitudes, ranging from 0 to 3800. The results in terms of improvement over the no-tensioning method using shape 11 are presented in Fig. 6. Clearly, the proposed MDRL method outperforms all other methods as it achieves the best performance over the entire testing range of gravity force. The single pinch point DRL method is the second-best algorithm whilst the analytic tensioning technique is completely dominated by the remaining methods.
This paper presents a dynamic multiple pinch point approach to learning effectively a time-varying tensioning policy based on DRL for cutting a deformable surgical tissue. The proposed MDRL algorithm has been tested on a deformable sheet simulator and its performance has been compared to several existing methods. Simulation results demonstrate the significant superiority of our method against its competing methods. Different process noise levels and external gravity forces were added to the simulation environment to test the robustness of the proposed MDRL method. Experimental results show that MDRL is robust to noise and external force and its robustness level is superior to its competing methods. The current method has a disadvantage regarding the number of tensioning directions, which limit at four in this study. A future work would be to increase the number of tensioning directions to improve the cutting accuracy of robot. This would be a necessary natural extension because tensioning in practice would extend to an arbitrary direction depending on the trajectory complexity and the deformation level of the tissue.
-  Langø, T., Vijayan, S., Rethy, A., Våpenstad, C., Solberg, O. V., Marvik, R., … & Hernes, T. N. (2012). Navigated laparoscopic ultrasound in abdominal soft tissue surgery: technological overview and perspectives. International Journal of Computer Assisted Radiology and Surgery, 7(4), 585-599.
-  Haouchine, N., Cotin, S., Peterlik, I., Dequidt, J., Lopez, M. S., Kerrien, E., & Berger, M. O. (2015). Impact of soft tissue heterogeneity on augmented reality for liver surgery. IEEE Transactions on Visualization & Computer Graphics, (1), 1-1.
-  Nichols, K. A., & Okamura, A. M. (2015). Methods to segment hard inclusions in soft tissue during autonomous robotic palpation. IEEE Transactions on Robotics, 31(2), 344-354.
-  Mahvash, M., Voo, L. M., Kim, D., Jeung, K., Wainer, J., & Okamura, A. M. (2008). Modeling the forces of cutting with scissors. IEEE Transactions on Biomedical Engineering, 55(3), 848-856.
-  Shademan, A., Decker, R. S., Opfermann, J. D., Leonard, S., Krieger, A., & Kim, P. C. (2016). Supervised autonomous robotic soft tissue surgery. Science Translational Medicine, 8(337), 337ra64.
-  Rateni, G., Cianchetti, M., Ciuti, G., Menciassi, A., & Laschi, C. (2015). Design and development of a soft robotic gripper for manipulation in minimally invasive surgery: a proof of concept. Meccanica, 50(11), 2855-2863.
-  Tai, K., El-Sayed, A. R., Shahriari, M., Biglarbegian, M., & Mahmud, S. (2016). State of the art robotic grippers and applications. Robotics, 5(2), 11.
-  Murali, A., Sen, S., Kehoe, B., Garg, A., McFarland, S., Patil, S., … & Goldberg, K. (2015, May). Learning by observation for surgical subtasks: Multilateral cutting of 3D viscoelastic and 2D orthotropic tissue phantoms. In Robotics and Automation (ICRA), 2015 IEEE International Conference on (pp. 1202-1209). IEEE.
-  Thananjeyan, B., Garg, A., Krishnan, S., Chen, C., Miller, L., & Goldberg, K. (2017, May). Multilateral surgical pattern cutting in 2D orthotropic gauze with deep reinforcement learning policies for tensioning. In Robotics and Automation (ICRA), 2017 IEEE International Conference on (pp. 2371-2378). IEEE.
-  Ritter, E. M., & Scott, D. J. (2007). Design of a proficiency-based skills training curriculum for the fundamentals of laparoscopic surgery. Surgical Innovation, 14(2), 107-112.
-  Dulan, G., Rege, R. V., Hogg, D. C., Gilberg-Fisher, K. M., Arain, N. A., Tesfay, S. T., & Scott, D. J. (2012). Developing a comprehensive, proficiency-based training program for robotic surgery. Surgery, 152(3), 477-488.
-  Stegemann, A. P., Ahmed, K., Syed, J. R., Rehman, S., Ghani, K., Autorino, R., … & Hassett, J. M. (2013). Fundamental skills of robotic surgery: a multi-institutional randomized controlled trial for validation of a simulation-based curriculum. Urology, 81(4), 767-774.
-  Fisher, R. A., Dasgupta, P., Mottrie, A., Volpe, A., Khan, M. S., Challacombe, B., & Ahmed, K. (2015). An over-view of robot assisted surgery curricula and the status of their validation. International Journal of Surgery, 13, 115-123.
-  Staub, C., Osa, T., Knoll, A., & Bauernschmitt, R. (2010, May). Automation of tissue piercing using circular needles and vision guidance for computer aided laparoscopic surgery. In Robotics and Automation (ICRA), 2010 IEEE International Conference on (pp. 4585-4590). IEEE.
-  Van Den Berg, J., Miller, S., Duckworth, D., Hu, H., Wan, A., Fu, X. Y., … & Abbeel, P. (2010, May). Superhuman performance of surgical tasks by robots using iterative learning from human-guided demonstrations. In Robotics and Automation (ICRA), 2010 IEEE International Conference on (pp. 2074-2081). IEEE.
-  Moustris, G. P., Hiridis, S. C., Deliparaschos, K. M., & Konstantinidis, K. M. (2011). Evolution of autonomous and semi‐autonomous robotic surgical systems: a review of the literature. The International Journal of Medical Robotics and Computer Assisted Surgery, 7(4), 375-392.
-  Prendergast, J. M., & Rentschler, M. E. (2016). Towards autonomous motion control in minimally invasive robotic surgery. Expert Review of Medical Devices, 13(8), 741-748.
-  Sen, S., Garg, A., Gealy, D. V., McKinley, S., Jen, Y., & Goldberg, K. (2016, May). Automating multi-throw multilateral surgical suturing with a mechanical needle guide and sequential convex optimization. In Robotics and Automation (ICRA), 2016 IEEE International Conference on (pp. 4178-4185). IEEE.
-  Crawford, N. R., Cicchini, S., & Johnson, N. (2017). Surgical robotic automation with tracking markers. U.S. Patent Application No. 15/609,334.
-  Shamaei, K., Che, Y., Murali, A., Sen, S., Patil, S., Goldberg, K., & Okamura, A. M. (2015, September). A paced shared-control teleoperated architecture for supervised automation of multilateral surgical tasks. In Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on (pp. 1434-1439). IEEE.
-  Osa, T., Sugita, N., & Mitsuishi, M. (2018). Online trajectory planning and force control for automation of surgical tasks. IEEE Transactions on Automation Science and Engineering, 15(2), 675-691.
-  Nguyen, T. T. (2018). A multi-objective deep reinforcement learning framework. arXiv preprint arXiv:1803.02965.
-  Nguyen, T., Nguyen, N. D., & Nahavandi, S. (2018). Multi-agent deep reinforcement learning with human strategies. arXiv preprint arXiv:1806.04562.
-  Yu, B., Tibebu, A. T., Stoyanov, D., Giannarou, S., Metzen, J. H., & Vander Poorten, E. (2016). Surgical robotics beyond enhanced dexterity instrumentation: a survey of machine learning techniques and their role in intelligent and autonomous surgical actions. International Journal of Computer Assisted Radiology and Surgery, 11(4), 553-568.
-  Du, Z., Wang, W., Yan, Z., Dong, W., & Wang, W. (2017). Variable admittance control based on fuzzy reinforcement learning for minimally invasive surgery manipulator. Sensors, 17(4), 844.
-  Nguyen, N. D., Nguyen, T., & Nahavandi, S. (2017). System design perspective for human-level agents using deep reinforcement learning: a survey. IEEE Access, 5, 27091-27102.
Khatami, A., Khosravi, A., Nguyen, T., Lim, C. P., & Nahavandi, S. (2017). Medical image analysis using wavelet transform and deep belief networks.Expert Systems with Applications, 86, 190-198.
Salaken, S. M., Khosravi, A., Nguyen, T., & Nahavandi, S. (2019). Seeded transfer learning for regression problems with deep learning.Expert Systems with Applications, 115, 565-577.
Khatami, A., Babaie, M., Tizhoosh, H. R., Khosravi, A., Nguyen, T., & Nahavandi, S. (2018). A sequential search-space shrinking using CNN transfer learning and a Radon projection pool for medical image retrieval.Expert Systems with Applications, 100, 224-233.
-  Nguyen, N. D., Nahavandi, S., & Nguyen, T. (2018). A human mixed strategy approach to deep reinforcement learning. arXiv preprint arXiv:1804.01874.
-  Chen, J., Lau, H. Y., Xu, W., & Ren, H. (2016, February). Towards transferring skills to flexible surgical robots with programming by demonstration and reinforcement learning. In Advanced Computational Intelligence (ICACI), 2016 Eighth International Conference on (pp. 378-384). IEEE.
-  Baek, D., Hwang, M., Kim, H., & Kwon, D. S. (2018, June). Path planning for automation of surgery robot based on probabilistic roadmap and reinforcement learning. In 2018 15th International Conference on Ubiquitous Robots (UR) (pp. 342-347). IEEE.
-  Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015, June). Trust region policy optimization. In International Conference on Machine Learning (pp. 1889-1897).
-  Courtecuisse, H., Allard, J., Kerfriden, P., Bordas, S. P., Cotin, S., & Duriez, C. (2014). Real-time simulation of contact and cutting of heterogeneous soft-tissues. Medical Image Analysis, 18(2), 394-410.
-  Gale, S., & Lewis, W. J. (2016). Patterning of tensile fabric structures with a discrete element model using dynamic relaxation. Computers & Structures, 169, 112-121.
-  Anantharaman, T., Campbell, M. S., & Hsu, F. H. (1990). Singular extensions: adding selectivity to brute-force searching. Artificial Intelligence, 43(1), 99-109.
-  Duan, Y., Chen, X., Houthooft, R., Schulman, J., & Abbeel, P. (2016, June). Benchmarking deep reinforcement learning for continuous control. In International Conference on Machine Learning (pp. 1329-1338).