On the journey towards complete autonomy in surgical robotics , automation of suturing has engrossed the community. There is a good reason for this, as suturing on Robot Assisted Minimally Invasive Surgeries (RAMIS) has been cited as being significantly more challenging and time consuming than manual suturing [7, 8], hence making automation of it an attractive goal for improving a surgeons quality of life during RAMIS. Emergence of surgical robotics simulators 
and open-sourced robotic platforms such as the da Vinci Research Kit (dVRK) provide the ability for rapid development towards realizing this goal.
Suturing requires a wide range of considerations for effective automation including needle manipulation , knot tying , identification of entry and exit points for the suture throw , and interfacing the automation with the surgeon for effective deployment . In this work, we investigate the specific sub-challenge of needle manipulation from the perspective of bimanual regrasping. Fontanelli et al. cite that 74% of suture throws from the in-vivo JIGSAW dataset  required regrasping which took on average 7.4s to complete . Previous work has also shown that proper grasping can enhance the effectiveness of suture throws [17, 16]. In fact, many planning techniques for suture needle throwing have even used task-specific gripper mechanisms to ensure proper grasping of a needle [9, 14, 24, 21]. While these mechanisms are effective in highlighting suture throwing capabilities, they are too task-specific to be deployed on a surgical robot.
On the other hand, previous work has approached the proper needle grasping problem by visual servoing  or learning from demonstrations . However, since there is no collision-free constraints, these works might not avoid needle regrasping after picking it up. Also, these methods are designed under some specific robot configuration and might require new calibration or training in another working space. Therefore, regrasping suture needles to the optimal location for throwing on standard RAMIS tooling is still of great importance.
Standard practice for surgeons is to regrasp the suture needle roughly one third needle length from the suture thread and over off the needle as shown in Fig. 1. This gives an end-goal for the robot to achieve that can be given to a motion planner as a terminal state in order to find a collision free trajectory for regrasping. However, a survey of every available state-of-the-art motion planner to date was too slow for this task and often times-out. This problem is not unexpected; planning times typically can take up to 3 minutes for constrained environments and neural planning methods are required to breach this limitation . These methods however are not directly developed for dual arm robots which are typical for RAMIS. In particular, the bases of the robotic arms are regularly adjusted between procedures so the learning based motion planner needs to generalize to all potential workspaces of the surgical robot.
To this end, we present a novel method for bimanual regrasping for suture needles. We use an RL approach integrated with rapid replanning to demonstrate real-time performance and success in regrasping. We structure the problem to be ego-centric to the needle and the end-effectors, making it far more generalizable than a typical configuration-space approach.
To plan for the needle passing task, the states and actions are related to the pose of the two end-effectors. The reference frames for these poses are not set to be the world frame or the base frame of an arm. Instead, the reference frames can be the frames of the end-effectors, which change as they moves. This setting is referred as the ego-centric setting, since the information that the end-effectors get at each time step is centered on themselves.
There are several benefits of using the ego-centric setting. First, an ego-centric approach allows the policy to successfully work in the global space without requiring the agent to explore the wide range of space. Instead, it only needs to observe the states relative to the goal, which are included in a local space. Hence, no matter where the agent and the goal are in the world frame, as long as their relation has been learned by the policy, the agent is able to reach the goal. Also, since the policy does not consider the configuration of a robot, it can be used for other robot arms, even for those with higher degree of freedom. These benefits suggest that the ego-centric setting is very data efficient for learning a policy that can work in many different scenarios.
To sum up, we specifically present the following contributions in this work:
fast trajectory generation for bimanual needle regrasping for suture throwing via RL,
RL training strategy which incorporates intermittent, targeted exploration to guide the policy while still allowing for it to generalize, and
ego-centric parameterization of the needle and surgical tools to generalize the trajectory for non-specific end-effector nor robotic base positioning.
While this work is directly aimed towards reducing the time needed to suture when conducting RAMIS, we also foresee it being an important step towards making other automated suture throw techniques deployable by relieving them of task-specific suture needle grasping mechanisms.
Ii Reinforcement Learning Background
The suture needle passing task in this work is formulated as an RL problem and is modelled as a Markov Decision Process (MDP). The goal is to find a deterministic policythat solves this MDP. The following notations and definitions are used for the RL formulation. Let be the tuple for the discrete-time finite-horizon MDP, where is the state space, is the action space,
is the transition probability,is the reward function, is the initial state distribution, is the discount factor, and is the task horizon. A policy is learned to maximize the expected return , where is the state at time step , is the action at , and .
In order to develop and test suture needle passing, a simulated environment is developed in V-REP based on Fontanelli’s et al. previous work .
Iii-a Initialization and Goal Generation for Needle Grasping
At the beginning of conducting needle passing, the needle can be grasped at a variety of points and directions. Therefore, the RL environment will be initialized with a random sample of an initial grasping point and direction such that the trained policy for motion planning will generalize well. In addition, the grasping point must be well defined. To model this mathematically, the following two coordinate frames are built on the needle.
Iii-A1 Needle frame
A Cartesian coordinate system is defined on the suture needle as shown in Fig.1(a). This frame is referred as the needle frame. The most common suture needle used in RAMIS are of a semicircle shape, hence these will be considered in this work. The equation for the semicircle suture needle in the needle frame is defined as:
where is the radius of the needle and is the angle on the needle. By randomly sampling an , a grasping point on the suture needle can be sampled. The pose of this grasping point in the needle frame is denoted as , where and is calculated by equation (1). Here, is the identity rotation represented in the quaternion form. Likewise, the goal grasping point can be defined by setting .
Iii-A2 Grasping point frame
In order to sample a grasping direction pointing to a grasping point for initialization, a spherical coordinate system is defined with the origin at the grasping point as shown in Fig. 1(b). This frame is referred as the grasping point frame. A point using Cartesian representation in this frame can be calculated as
where are the radius, azimuth, and inclination respectively from the grasping point frame. This parameterization is beneficial for setting grasping direction of a robotic gripper. defines the depth of the grasp with regards to the gripper while and give the grasping angle relative to the needle. Therefore, by randomly sampling a , the target position of the end-effector will be set to , and its orientation will become such that the gripper points from to the origin in the grasping point frame. can then be transformed to the needle frame by
where is the homogeneous representation of a pose. Then the end-effector can be set to reach , hence grasping the needle, through inverse kinematics. The goal grasping direction can be set in a similar fashion. In the following sections, the end-effector that is initialized to hold a needle is referred as the grasping end-effector, and the one that approaches the goal is referred as the regrasping end-effector.
Iii-B Ego-Centric State and Action Space
With the ego-centric setting, the states in and actions in for the needle passing task are defined as follows. A state includes
the position and quaternion of the needle measured in the grasping end-effector frame , , and
the position and quaternion of the regrasping end-effector measured in the grasping end-effector frame , , and
the position and quaternion of the needle measured in the regrasping end-effector frame , .
After initializing the environment, is available by calculating the inverse of equation (3). For simplicity, in this work it is assumed that only the regrasping arm moves during planning. Yet, the ego-centric setting can be directly applied to two moving arms. Moreover, since a state has nothing to do with the joint angles or the base frame of the robot, two very different configurations of a robot arm can have the same state as shown in Fig. 3.
An action is defined as the variation of the position and quaternion of the regrasping end-effector measured in the grasping end-effector frame , . Given that the regrasping end-effector moves from to at time step t, the action would be
which would eventually be used as the control commands to plan with in this ego-centric setting.
Iii-C Learning a Policy by Deep Deterministic Policy Gradients with Demonstrations
To solve the suture needle passing task, we train a policy by Deep Deterministic Policy Gradients (DDPG) with behavior cloning (BC) . DDPG is one of the most widely used RL algorithms for continuous control, and its combination with BC helps guide the exploration of an agent. Since the process of passing a needle is very sophisticated, especially when the two end-effectors are close to each other, it is reasonable to incorporate demonstrations into learning to motivate precise motions and speed up training.
The states and actions of this task are defined by the ego-centric setting as mentioned in the previous section. Also, the rewards are defined as follows :
Equation (7) and (8) describes the distance between the current pose and the goal grasping pose . transforms a quaternion to its axis-angle representation. is a value that can be tuned. This value should be small enough to prevent the agent from not approaching the needle, while it cannot be too small, or the agent might choose to collide with the needle instead of roaming to find a feasible path. Usually, the value of is tuned according to the reward of collision, which is -1 here, and the maximum time steps that the agent is allowed to run in the environment.
Iii-C1 Generating expert demonstrations
Generating expert demonstrations can be time consuming if we ask an real expert to do so. However, the goal position and orientation of holding a suture needle can be calculated from equation (3) by setting some target . Therefore, the expert demonstrations can be generated automatically in the simulation environment.
An intuitive way to do this is to apply some motion planning algorithm to a given start configuration and a goal grasping configuration. Yet, in this way, the feasible space is narrower near the goal, so a gripper is more likely to collide with the needle or another gripper. This leads to failure for planning algorithms to find a feasible path in a given time period.
Fig. 4 shows the scenario and probability of no collision when the regrasping gripper moves closer to the goal. From this figure, it can be observed that the feasible space looks like a funnel with the narrow part near the goal. Note that this space shrinks a lot within a small distance hence making the planning require high precision. This makes a motion planning algorithm more difficult to find a path without collision. To overcome this difficulty, instead of planning from a starting point to a goal grasping point, a path can be planned in reverse, i.e., escaping a funnel. Based on this concept, the expert demonstrations are generated by applying a motion planning algorithm from the goal grasping point to a randomly sampled starting point in the free space. The planned path is then reversed to be used for learning a policy.
Iii-C2 Applying demonstrations via an Active Learning approach
Active learning involves collecting demonstrations during training. It is used to demonstrate successful episodes only when the RL policy fails, which helps guide the exploration of the robot and significantly reduce random exploration .
This strategy, referred as target exploration in , can be easily applied to our work. However, in our experiments it is observed that eliminating random exploration slows down the learning. The main reason is that the expert, which is a motion planning algorithm, is not perfect. It is not guaranteed that this algorithm can always provide a feasible path, and relying almost entirely on targeted exploration can instead hurt the performance. Therefore, to make an agent explore in a more effective way, random exploration is kept, and target exploration is gradually introduced to training. This exploration strategy is referred as mixed exploration.
When generating episodes with mixed exploration, if an episode fails, then with probability , a motion planning algorithm will generate a demonstration for this episode. is the probability of applying target exploration and will increase from to 1. Meanwhile, to stabilize training, the probability of random exploration will decrease from to 0.
Algorithm 1 describes the process of generating episodes with mixed exploration. After generating episodes, and will be updated by
where and are the changes applied to and respectively.
Iv Experiments Setup
A series of experiments are conducted in order to evaluate the performance of the proposed methods. First the training and testing settings are defined followed by a comparison of different exploration strategies for RL training. The learned motion planner from RL is also compared against classical motion planners. Lastly, the policy is implemented and tested on a real robot to pass a 5.4mm suture needle, and all regrasps are done in a single pass.
The needle selected for simulation is of radius mm. To resemble the preferred grasping position and orientation of surgeons before throwing the suture needle, the goal is generated by setting and , , in the needle and grasping point frame respectively. Meanwhile, the randomized initialization for the initial grasping is generated by sampling uniformly from and , , in the needle and grasping point frame respectively. The regrasping end-effector is randomly initialized to be 13mm away from the center of the needle with a position variation and a orientation variation along some rotation axis.
is a uniform distribution. Expert demonstrations for both the behavioral cloning and mixed exploration are generated using batch informed trees (BIT*) due to its good performance in planning time. The maximum horizon of the RL environment is 100, in the reward function is tuned to be , and . For the upcoming comparison studies, a test set of 300 randomized initialized grasps is generated and used for all the corresponding results.
The implemented DDPG + BC is based on OpenAI Baselines implementation 
. The actor and critic neural networks in DDPG are both multilayer perceptrons with 3 layers and 512 neurons per layer. They are trained for a total of 500 epochs. Each epoch contains 10 iterations, where in each iteration, 5 episodes are collected in parallel to fill a replay buffer with a set size of. Per iteration, the actor and critic networks are updated a total of 200 times with a batch size of 256 and learning rate of for both the actor and critic. Meanwhile, the target actor and critic networks are updated only once every 40 of those updates with a coefficient value of 0.95 for polyak-averaging. The discount factor, , is set to 0.99, and an additional quadratic penalty for the actions is with a coefficient of 1. The coefficient for the primary and cloning loss of the actor are both set to . The total number of demonstration episodes generated for BC is 9900.
V-a Solving Needle Passing with Mixed Exploration
To compare the proposed RL exploration algorithm with other methods, three separate policies are trained with the following settings in the needle passing environment:
DDPG + BC  where: ,
DDPG + BC with targeted exploration  where: , , ,
DDPG + BC with mixed exploration where: , , ,
The DDPG + BC portion in all three methods uses the same network architecture and training hyper-parameters as previously listed.
|Algorithm||Success rate||Planning time||Path length|
The training curves are shown in Fig. 5, and performance of the best trained policies are shown in Table I. DDPG + BC (mixed) outperforms the other two methods with regards to success rate and planning time. DDPG + BC (targeted) does reach a higher success rate at the beginning, but becomes unstable as the training goes on due to more heavily relying on imperfect experts. Therefore, DDPG + BC is able to eventually surpass DDPG + BC (targeted) for success rate.
V-B Comparing Against Sampling-Based Motion Planners
To show that DDPG + BC (mixed) can solve the suture needle passing task effectively and efficiently, a comparison against other sampling-based motion planning algorithms is conducted. These algorithms include probabilistic roadmaps with the star strategy (PRM*) [12, 11], rapidly exploring random trees with the star strategy (RRT*) , bidirectional fast marching trees (BFMT*) , and BIT* . All of these algorithms are implemented on the needle passing environment using The Open Motion Planning Library . These are top performing planners in OMPL. Table II summarizes their respective performances alongside the proposed learning based motion planning strategy in the needle passing environment. The proposed method performs significantly better with regards to success rate and planning time. The planners are also tested in reverse to highlight the speed up and improved success rate when doing so, hence showing why BIT* in reverse, which performed the best of the sampling-based planners, is used to generate demonstrations for the RL training.
|Algorithm||Success rate||Planning time||Path length|
V-C Real World Experiment
The best trained RL policy is tested in the real world on a da Vinci Research Kit (dVRK)  with a suture needle of radius 5.4mm. The needle is initially grasped in one of the Patient Side Manipulators (PSM) arms from the dVRK using a Large Needle Driver (LND). The end-effector of this PSM arm is the grasping end-effector. The end-effector of another PSM arm with an LND will act as the regrasping end-effector by following the RL policy to regrasp the needle. In order to provide the state information for the RL policy, dVRK’s stereo-endoscope is used which are 1080p and run at 30fps. Both PSM arms are tracked from the stereo-endoscope using our previous work  which gives the pose of end-effectors in the camera frame. Let and be the pose of the real-time tracked grasping and regrasping end-effectors respectively in the camera frame. Then the ego-centric state for the regrasping end-effector can be computed as:
In our previous work, we also showed effective control of the end-effector in the camera frame . Therefore, each action generated by the policy is converted to a target pose in the camera frame by solving using equations (4), (5), and (11). Then, following our previously developed controller, the end-effector of the regrasping arm in its own base frame is set to the pose via dVRK’s built in inverse kinematic, which is computed by:
where is updated in real-time by our previous surgical tool tracking method .
Similarly, the needle is reconstructed using the technique proposed in , which gives the pose of the needle in the camera frame. Let be the reconstructed pose of the needle. Combined with the previously described end-effector tracking, the reconstructed needle pose can be transformed to the grasping and regrasping end-effector frames by:
hence giving the last components of the state information for the policy from the real world. However, we experimentally found the reconstruction to be too inaccurate to directly apply as shown in Fig. 6. This highlights that needle reconstruction is the weak point of transferring the RL policy to the real-world. Therefore, to map the reconstructed pose to a valid initial grasp, this pose will be compared to a set of 1000 valid initial grasps collected from the simulated scene beforehand. The pose in this set that is closest to will be used for the policy. The distance is calculated in the same manner as equation (7) and (8).
In the real-world experiments, the poses of an initial needle grasp and initial end-effectors are manually randomized. The complete success rate for the needle passing is 73.3% from 15 trials, with an average planning time of 0.0846s and an average run time of 5.1454s. Several example regrasps are shown in Fig. 1. In these trials, the main failures are caused by the needle reconstruction. To test this, a secondary experiment is conducted where the initial grasp is preset to some known position and orientation. During this experiment, the success rate of the needle passing is 90.5% from 21 trials, with an average planning time of 0.0807s and an average run time of 2.8801s. In this case, the main failures come from the arm reaching a joint limit or needle regrasped too inside of the gripper, which are shown in Fig. 7.
Vi Discussion and Conclusion
In this work, we present a novel method for trajectory generation to conduct suture needle regrasping. The task of regrasping is a critical and time consuming task during RAMIS. It is critical since the suture needle needs to be properly orientated and positioned to conduct an effective throw. Given the proposed work, one can combine it with the litany of work in automatic needle throwing to complete the autonomous suturing procedure. Moving forward, the largest contributor to failed grasps is from inaccurate needle reconstruction. This will be solved by applying Bayesian estimation under constraints where the constraint will be feasible grasping positions and orientations as described in this work.
This research was supported by the Telemedicine and Advanced Technology Research Center (TATRC) T.R.O.N. program.
-  (2018) Automated pick-up of suturing needles for robotic surgical assistance. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1370–1377. Cited by: §I.
-  (2017) OpenAI baselines. GitHub. Note: https://github.com/openai/baselines Cited by: §IV.
-  (2018) A new laparoscopic tool with in-hand rolling capabilities for needle reorientation. IEEE Robotics and Automation Letters 3 (3), pp. 2354–2361. Cited by: §I.
-  (2018) A v-rep simulator for the da vinci research kit robotic platform. In BioRob, Cited by: §III.
Batch informed trees (bit*): sampling-based optimal planning via the heuristically guided search of implicit random geometric graphs. In 2015 IEEE international conference on robotics and automation (ICRA), pp. 3067–3074. Cited by: §IV, §V-B.
-  (2014) Jhu-isi gesture and skill assessment working set (jigsaws): a surgical activity dataset for human motion modeling. In Miccai workshop: M2cai, Vol. 3, pp. 3. Cited by: §I.
-  (1998) Manual vs robotically assisted laparoscopic surgery in the performance of basic manipulation and suturing tasks. Archives of surgery 133 (9), pp. 957–961. Cited by: §I.
-  (2003) A performance study comparing manual and robotically assisted laparoscopic surgery using the da vinci system. Surgical Endoscopy and other interventional techniques 17 (10), pp. 1595–1599. Cited by: §I.
-  (2013) Needle path planning for autonomous robotic surgical suturing. In 2013 IEEE International Conference on Robotics and Automation, pp. 1669–1675. Cited by: §I.
-  (2019) Harnessing reinforcement learning for neural motion planning. arXiv preprint arXiv:1906.00214. Cited by: §III-C2, §III-C2, §III-C, item 2.
-  (2011) Sampling-based algorithms for optimal motion planning. The international journal of robotics research 30 (7), pp. 846–894. Cited by: §V-B.
-  (1996) Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE transactions on Robotics and Automation 12 (4), pp. 566–580. Cited by: §V-B.
-  (2014) An open-source research kit for the da vinci® surgical system. In 2014 IEEE international conference on robotics and automation (ICRA), pp. 6434–6439. Cited by: §I, §V-C.
-  (2014) Smart tissue anastomosis robot (star): a vision-guided robotics system for laparoscopic suturing. IEEE Transactions on Biomedical Engineering 61 (4), pp. 1305–1317. Cited by: §I.
-  (2020) SuPer: a surgical perception framework for endoscopic tissue manipulation with surgical robotics. IEEE Robotics and Automation Letters 5 (2), pp. 2294–2301. Cited by: §V-C.
-  (2015) Optimal needle grasp selection for automatic execution of suturing tasks in robotic minimally invasive surgery. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 2894–2900. Cited by: §I.
-  (2016) Needle grasp and entry port selection for automatic execution of suturing tasks in robotic minimally invasive surgery. IEEE Transactions on Automation Science and Engineering 13 (2), pp. 552–563. Cited by: §I.
-  (2002) TRIP: a low-cost vision-based location system for ubiquitous computing. Personal and Ubiquitous Computing 6 (3), pp. 206–219. Cited by: §V-C.
A system for robotic heart surgery that learns to tie knots using recurrent neural networks. Advanced Robotics 22 (13-14), pp. 1521–1537. Cited by: §I.
-  (2018) Overcoming exploration in reinforcement learning with demonstrations. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6292–6299. Cited by: §III-C, item 1.
-  (2017) Autonomous suturing via surgical robot: an algorithm for optimal selection of needle diameter, shape, and path. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 2391–2398. Cited by: §I.
-  (2020) Motion planning networks: bridging the gap between learning-based and classical motion planners. IEEE Transactions on Robotics. Cited by: §I.
-  (2019) Open-sourced reinforcement learning environments for surgical robotics. arXiv preprint arXiv:1903.02090. Cited by: §I.
-  (2016) Automating multi-throw multilateral surgical suturing with a mechanical needle guide and sequential convex optimization. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 4178–4185. Cited by: §I.
-  (2010) Kalman filtering with state constraints: a survey of linear and nonlinear algorithms. IET Control Theory & Applications 4 (8), pp. 1303–1318. Cited by: §VI.
-  (2015) An asymptotically-optimal sampling-based algorithm for bi-directional motion planning. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2072–2078. Cited by: §V-B.
-  (2012-12) The Open Motion Planning Library. IEEE Robotics & Automation Magazine 19 (4), pp. 72–82. Note: https://ompl.kavrakilab.org External Links: Cited by: §V-B.
-  (2019) Automated extraction of surgical needles from tissue phantoms. In 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE), pp. 170–177. Cited by: §I.
-  Collaborative suturing: a reinforcement learning approach to automate hand-off task in suturing for surgical robots. In 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), pp. 1380–1386. Cited by: §I.
-  (2017) Single-master dual-slave surgical robot with automated relay of suture needle. IEEE Transactions on Industrial Electronics 65 (8), pp. 6343–6351. Cited by: §I.
-  (2018) ROBOT autonomy for surgery. In The Encyclopedia of Medical Robotics, pp. 281–313. External Links: Cited by: §I.