Learning to Localize, Grasp, and Hand Over Unmodified Surgical Needles

12/08/2021
by   Albert Wilcox, et al.
berkeley college
0

Robotic Surgical Assistants (RSAs) are commonly used to perform minimally invasive surgeries by expert surgeons. However, long procedures filled with tedious and repetitive tasks such as suturing can lead to surgeon fatigue, motivating the automation of suturing. As visual tracking of a thin reflective needle is extremely challenging, prior work has modified the needle with nonreflective contrasting paint. As a step towards automation of a suturing subtask without modifying the needle, we propose HOUSTON: Handoff of Unmodified, Surgical, Tool-Obstructed Needles, a problem and algorithm that uses a learned active sensing policy with a stereo camera to localize and align the needle into a visible and accessible pose for the other arm. To compensate for robot positioning and needle perception errors, the algorithm then executes a high-precision grasping motion that uses multiple cameras. In physical experiments using the da Vinci Research Kit (dVRK), HOUSTON successfully passes unmodified surgical needles with a success rate of 96.7 handover sequentially between the arms 32.4 times on average before failure. On needles unseen in training, HOUSTON achieves a success rate of 75 - 92.9 our knowledge, this work is the first to study handover of unmodified surgical needles. See https://tinyurl.com/houston-surgery for additional materials.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 4

page 5

11/12/2020

Intermittent Visual Servoing: Efficiently Learning Policies Robust to Instrument Changes for High-precision Surgical Manipulation

Automation of surgical tasks using cable-driven robots is challenging du...
04/13/2021

Optimal Multi-Manipulator Arm Placement for Maximal Dexterity during Robotics Surgery

Robot arm placements are oftentimes a limitation in surgical preoperativ...
11/16/2020

Autonomously Navigating a Surgical Tool Inside the Eye by Learning from Demonstration

A fundamental challenge in retinal surgery is safely navigating a surgic...
06/04/2021

Disentangling Dense Multi-Cable Knots

Disentangling two or more cables requires many steps to remove crossings...
12/23/2020

Superhuman Surgical Peg Transfer Using Depth-Sensing and Deep Recurrent Neural Networks

We consider the automation of the well-known peg-transfer task from the ...
07/26/2021

Autonomous Coordinated Control of the Light Guide for Positioning in Vitreoretinal Surgery

Vitreoretinal surgery is challenging even for expert surgeons owing to t...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Robotic Surgical Assistants (RSAs) currently rely on human supervision for the entirety of surgical tasks, which can consist of many very repetitive subtasks such as suturing. Automation of surgical subtasks may reduce surgeon fatigue [yip2017robot], with initial results in surgical cutting [thananjeyan2017multilateral, murali2015learning], debridement [seita_icra_2018, murali2015learning], suturing [sen2016automating, superhuman_dict, thananjeyan2019safety, chiu2020bimanual, saeidi_suturing_icra_2019, extraction_needles_2019, automated_needle_pickup_2018, improved_knots_case_2013], hemostasis [ritcher_bloodflow_2020], and peg transfer [paradis2020intermittent, hwang2020applying, hwang2020efficiently, auto_peg_transfer_2015]. This paper considers automation of the bimanual regrasping subtask [chiu2020bimanual] of surgical suturing, which involves passing a surgical needle from one end effector to another. This handover motion is performed in between stitches during suturing, and is a critical step, as accurately positioning the needle in the end effector affects the stability of its trajectory when guided through tissue. Because varying cable tension in the cables driving the arms causes inaccuracies in motions, a high precision task such as passing a needle between the end effectors is challenging [mahler2014case, hwang2020applying, paradis2020intermittent, hwang2020efficiently, peng2020real]. The task is also difficult because 3D pose information is critical for successfully manipulating needles, and surgical needles are challenging to perceive with RGB or active depth sensors due to their reflective surface, thin profile [kollar2021simnet], and self-occlusions (Figure 2). Prior work has mitigated this by painting needles [sen2016automating, chiu2020bimanual]

and using color segmentation, but this solution is not practical for clinical use. To manipulate unmodified surgical needles, we combine recent advances in deep learning, active sensing, and visual servoing. We present HOUSTON: Handoff of Unmodified, Surgical, Tool-Obstructed Needles, a problem and algorithm for using stereo vision with coarse and fine-grained control policies (Figure 

1) to sequentially localize, orient, and handover unmodified surgical needles. We present a localization method using stereo RGB with a deep segmentation network to output a point cloud of the needle in the workspace. This point cloud is used to define a coarse robot policy that uses visual servoing to reorient the needle for handover in a pose that is visible to the cameras and accessible by the other end effector. However, due to inaccuracies of robot positioning and the perception system, further corrections may be necessary. We train a fine robot policy from a small set of human demonstrations to perform these subtle but critical corrections from images for unmodified surgical needles.

Fig. 1: Algorithm overview

: The algorithm first servos the needle into a position that is easily visible to the stereo camera to produce a high confidence pose estimate. Using this estimate, it coarsely reorients the needle towards the grasping arm, and then refines this iteratively to correct for positioning errors. Finally, it executes a learned visual servoing grasping policy to complete the handover.

This paper makes the following contributions:

  1. A perception pipeline using stereo RGB to accurately estimate the pose of surgical steel needles in 3D space, enabling needle manipulation without active depth sensors and painted needles.

  2. A visual servoing algorithm to perform coarse reorientation of a surgical needle for grasping.

  3. A needle grasping policy that performs fine control of the needle learned from a small set of human demonstrations to compensate for robot positioning and needle sensing inaccuracies.

  4. Combination of the pose estimator (1), the servoing algorithm (2) and needle controller (3) to perform bimanual surgical needle regrasping, where physical experiments on the da Vinci Research Kit (dVRK) [dvrk2014] suggest a success rate of on needles used in training, and on needles unseen in training. On sequential handovers, HOUSTON successfully executes 32.4 handovers on average before failure.

Ii Related Work

Ii-a Automation in Surgical Robotics

Automation of surgical subtasks is an active area of research with a rich history. Prior literature has studied automation of tasks related to surgical cutting [thananjeyan2017multilateral, murali2015learning, krishnan2019swirl], debridement [murali2015learning, kehoe2014autonomous], hemostasis [ritcher_bloodflow_2020], peg transfer [hwang2020applying, hwang2020efficiently, paradis2020intermittent], and suturing [sen2016automating, thananjeyan2019safety, chiu2020bimanual, extraction_needles_2019, automated_needle_pickup_2018, saeidi_suturing_icra_2019]. While automated suturing has been studied in prior work [saeidi_suturing_icra_2019, sen2016automating], suturing without modifications such as painted fiducial markers is an open research problem. Recent work studies robust and general approaches to specific subproblems within suturing, including the precise manipulation of surgical needles during suturing from needle extraction [extraction_needles_2019] to bimanual regrasping [chiu2020bimanual], which is the focus of this work. Needle manipulation is also studied by [extraction_needles_2019], where the approach studies the extraction of needles from tissue phantoms to compute robust grasps of the needle even in self-occluded configurations. Bimanual needle regrasping was studied in detail by chiu2020bimanual, with impressive results on simulation-trained policies that take needle end effector poses as input. We extend their problem definition to consider multiple handoffs of unmodified needles and end effectors, which requires perception of the needle and robot pose from images without color segmentation. Needle grasping has also been studied using visual servoing policies in automated_needle_pickup_2018, where the needle is painted with green markers to track its position during closed loop visual servoing. varier2020collaborative study tabular RL policies for needle regrasping in a discretized space in a fixed setup with known needle pose and experiments without the needle on the dVRK. The experiments in [varier2020collaborative]

suggest that value iteration-trained policies can mimic expert trajectories used for inverse reinforcement learning. In contrast, we present an algorithm compatible with significantly varying initial needle and gripper poses using only image observations. We additionally present many physical experiments with a needle, evaluating the success rate and speed of the algorithm.

Ii-B Visual Servoing, and Active Perception

Visual servoing (VS) is a popular technique in robotics [hutchinson1996tutorial, kragic2002survey], and has recently been applied to compensate for surgical robot imprecision in the surgical peg transfer task [paradis2020intermittent]. While classical VS approaches typically make use of hand-tuned visual features and known system dynamics [chaumette2006visual, caron2013photometric], recent work proposes learning end-to-end visual servoing policies from examples [levine2018learning, QT-Opt]. To reduce the need for tuned features and dynamics models and also reduce the number of training samples required to create a robust VS policy for bimanual needle regrasping, we present a hybrid approach that combines coarse motion planning with fine control, where a learned VS policy is only used in parts of the task where high precision is required. This framework, called intermittent visual servoing (IVS), was studied in detail by [paradis2020intermittent]

, where the system switches between a classical trajectory optimizer and imitation learning VS policy based on the precision required at the time. Inspired by this technique, we present an IVS approach to bimanual needle regrasping, that combines coarse perception and planning with fine VS control. Because this task requires reasoning about depth across several directions, we present a multi-view VS policy that learns to precisely hand over the needle based on several camera views. Active perception is a popular technique with many variations to localize objects prior to manipulation by maximizing information gain about their poses 

[mihaylova2002comparison, salaris2017online, whitehead1990active, bajcsy1988active, arruda2016active]. This has been studied in the context of robot-assisted surgery, where the endoscope position is automatically adjusted via a policy learned from demonstrations to center the camera focus on inclusions during surgeon teleoperation. In this work, we actively servo the needle to highly visible poses to maximize the accuracy of its pose estimate. This is most similar to [arruda2016active], where the authors propose an algorithm to actively select views of a grasping workspace to uncover enough information about unknown objects to plan grasps.

Iii Problem Formulation

The HOUSTON problem extends and generalizes the previous problem definition from chiu2020bimanual to include unmodified needles, occlusion, and multiple handoffs.

Iii-a Overview

In the HOUSTON problem, the surgical robot starts with a curved surgical needle with known curvature and radius grasped by one gripper and must accurately pass it to the other gripper and back. This is challenging due to the needle’s reflective surface, thin profile, and pathological configurations [extraction_needles_2019, chiu2020bimanual] as depicted in Figure 2. Once the needle is successfully passed to the other end effector, it is passed back to the first end effector. This process is repeated times, or until the needle is dropped. We also consider a special case of this problem, the single-handover version, in which .

Iii-B Notation

Let and denote the poses of the left and right grippers, respectively, at discrete timestep with respect to a world coordinate frame. The needle has pose with respect to the world frame. Observations of the workspace are available via RGB images from a stereo camera, and , or overhead monocular RGB images from an RGB camera. The left and right cameras in the stereo pair have world poses and , respectively. The overhead camera has pose in world frame . Each trial starts with the needle in the left gripper and ends when the needle is dropped. Additionally, the trial terminates if no successful handoff occurs in timesteps.

Fig. 2: Visible and occluded configurations: The left two frames depict needle orientations that are easily identifiable. However, the needle frequently reaches states that are occluded by the gripper or self-occluded, which makes estimating its state challenging.

At timestep , the algorithm is provided observation which contains images from the sensors: . The algorithm outputs a target pose and jaw state for each end effector , where indicates the whether the left jaw is closed at timestep .

Iii-C Assumptions

In order to deterministically evaluate HOUSTON policies in a wide variety of needle configurations, we discretize the needle-in-gripper pose possibilities by choosing a number of categories across three degrees of freedom:

  1. The needle’s curve can face either towards or away from the camera, providing 2 possibilities

  2. The gripper may hold the needle either at the tip, or 30° inwards following the curvature of the needle. This degree of freedom has 2 possibilities.

  3. The rotation of the needle about the tangent line to the point the gripper intersects ranges from to , and is discretized into 7 bins as in Figure 3.

This gives a total of 28 possible needle configurations and we perform a grid search over these possibilities. We chose these configurations to be representative of those seen in suturing tasks post-needle extraction. While the robot encoders provide an estimate of the gripper poses and , the precise needle pose is unknown due to cabling effects of the arms. We assume access to a stereo RGB pair in the workspace, an overhead RGB camera, and the transforms between the coordinate frames of these cameras and the robot arms.

Iii-D Evaluation Metrics

We evaluate HOUSTON by recording: i) the number of successful handoffs in a multi-handoff trial, ii) the success rate per arm of single handoffs beginning from each configuration in III-C, and iii) the average time for each handoff.

Fig. 3: Needle Reset Degrees of Freedom: We vary the starting configurations of the needle relative to the gripper in three degrees of freedom as described in III-C. Left: We rotate the needle into 7 discretized states about the axis tangent to the needle. Middle: An example holding the middle 30°inward from the tip. Right: An example where the needle’s arc faces towards the camera.

Iv HOUSTON Algorithm

HOUSTON uses active stereo visual servoing with both a coarse-motion and fine-motion learned policy for the bimanual regrasping task.

Iv-a Phase 1: Active Needle Presentation

In the first phase, the algorithm repositions the needle to a pose where the other arm can easily grasp it without collisions and the cameras can clearly view it. Throughout execution of the coarse policy, we parameterize the needle as a circle of known radius, and measure its state in world frame as a center point

, normal vector

, and a needle tip point as shown in Figure 6. Active Needle Presentation consists of two stages: needle acquisition and handover positioning. The needle acquisition stage moves the needle to maximize visibility, and the positioning stage uses visual servoing to move the needle into a graspable state.

Iv-A1 Needle State Estimation

The state estimator passes stereo images into a fully convolutional neural network that is trained to output segmentation masks for the needle in each image. See the project website for architecture details. It computes a distance transform of the segmentation mask to label each pixel with its distance to the nearest unactivated pixel. Next, it finds peaks in the distance transform along horizontal lines in each image, which correspond to points near the center of activated patches. It then triangulates all pairs of peaks along each horizontal line in the images to obtain a point-cloud as in Figure

4

. Because this may triangulate outlier points from the gripper or incorrectly match points on different parts of the needle, RANSAC is applied to filter out incorrect point correspondences and returns the final predicted needle state. At each iteration, it samples a set of 3 points, to which a plane is fit. Each subset of 3 points generates 2 candidate circles in the plane, corresponding to the two circles which pass through one of the pairs. RANSAC uses an inlier radius of 1mm and runs for 300 iterations. The network is first trained on a dataset of 2000 simulated stereo images of randomly-placed, textured and scaled floating needles and random objects 

[calli2017yale] above a surface plane generated in Blender 2.92. Lighting intensity, size, and position, and stereo camera position are also randomized. The segmentation network is fine-tuned on a dataset of 200 manually-labeled images of the surgical needle in the end effector. Training a network on a PC with an NVIDIA V100 GPU takes 3 hours, and fine tuning takes 10 minutes.

Fig. 4: Needle state estimation: Execution of the stereo RGB pipeline on a highly visible needle. The network takes in raw stereo images as input, producing segmasks of the needle. The triangulated segmasks produce a point-cloud to which a circle is fit with RANSAC (3rd panel). Inliers are shown in blue, the best-fit circle in green, and outliers in red. The resulting observation reprojected into the left image is shown in the final panel.

Iv-A2 Visual servoing

HOUSTON uses Algorithm 1 to compute updates for visual servoing in both the needle presentation and the handover positioning phases. Algorithm 1 is a fixed point iteration method that uses a state estimator to iteratively visually servo to a target state. Similar to first and second order optimization algorithms, it computes a global update based on its current state and iterates until the computed update is zero. In each iteration, it queries the current 3D state estimate of the needle and then computes an update step in the direction of the target state. To compensate for estimation errors due to challenging needle poses, this process is repeated at each iteration until the algorithm converges to a local optimum within a pose error tolerance. During needle acquisition, the arm moves to a home pose, then rotates around the world and axes, stopping when the state estimator observes at threshold number of inlier points from RANSAC circle fitting. During trials, at most 2 consecutive rotations sufficed to resolve the state to this degree. After this initial acquisition, we apply Algorithm 1 to align towards the left stereo camera position , with defined as a rotation about the axis . Once clearly presented to the camera, we measure by choosing the inlier point from the circle fitting step which is furthest from the gripper in 3D space. Subsequently, during handover positioning, we compute inverse kinematics to move towards the center of the workspace with pointing towards the other gripper and orthogonal to the table plane. This flat configuration is critical for the grasping step, since the dVRK arm is primarily designed for top-down grasps near the center of its workspace. Because the arm only has 5 rotational degrees of freedom, we use a numerical IK solver from [tracik] and attempt to find a configuration minimizing rotational error within a cm tolerance region on end effector translation. After moving to this pose, we repeat Algorithm 1, with defined as a rotation aligning towards the other gripper and orthogonal to the table. We calculate IK to a configuration with needle curvature towards the camera and one with curvature away from the camera, then pick the configuration which minimizes rotational error to the goal.

Fig. 5: Fine-grained grasping policy: We split corrective actions along the - and -axes and learn two corresponding policies and

. Each policy begins by performing an ego-centric crop by projecting the gripper’s kinematically calculated approximate location into the input image and cropping around it. Then, it feeds the cropped image through a neural network to predict a correction direction.

Iv-B Phase 2: Executing a Grasping Policy

After the active presentation phase described in Section IV-A and the pose of the needle is relatively accurately known and accessible to the grasping arm, we execute a grasping policy to grasp the needle. However, the needle pose estimate after the first phase may not be perfect, so we must visually servo to compensate for these errors when grasping. Even if the needle pose was perfectly known, reliable grasping of a small needle is still challenging due to the positioning errors of the robot, which are a result of its cable-driven arms [paradis2020intermittent, hwang2020efficiently, hwang2020applying, seita_icra_2018, peng2020real, mahler2014case]. The policy splits corrective actions between the - and -axes with two sub-policies, and . Each policy uses RGB inputs ego-centrically cropped around the grasping arm, with the - and -axis policies using pixel crops from the inclined camera and pixel crops from the overhead camera respectively. The cropping forces the policy to condition based on the relative position of the gripper and needle without the ability to overfit to texture cues from other parts of the scene. The fine-grained grasping policy and image crops are displayed in Figure 5. We ablate different design choices for the grasping correction policy and also present open-loop grasping results in Section V. The grasping subpolicy

is a neural network classifier that outputs whether the grasping arm should move in the

(down in the crop) or (up in the crop) direction. is trained similarly to output whether the grasping arm should move in the or directions. The policies are trained by collecting offline human demonstrations through two methods: 1) we sample poses for arms in the workspace such that needle orientation is perturbed by about each axis, then move the robot to a good grasping position via a keyboard teleoperation interface. 2) we execute the pre-handover positioning routine and position the robot in the desired grasp location by hand, after which the robot autonomously iterates through offsets in the and directions, labeling actions according to the offset from goal position. We experimentally find that separating the policy across two axes significantly improves grasp accuracy (Section V). A separate grasping policy is trained for each arm on 100 demonstrations each. Each demonstration takes 5-10 actions and each dataset takes about an hour to collect. The policies and are each represented by voting ensembles of 5 classifiers, each of which have three convolutional layers and two fully connected layers. Details about model architectures are located in the project website. During policy execution, we iteratively sample actions first from , then , waiting for each to converge before continuing. We multiply the action magnitude by a scalar every time the network outputs the opposite action of the previous timestep. Servoing terminates when action magnitude decays to under mm. This enables implicit convergence to the goal without explicitly training the policy to stop. After and convergence, we execute a simple downward motion of 1cm to grasp the needle.

1:State estimator , target needle state , arm grasping needle , current time , number of iterations , tolerance , distance metric .
2:for  do
3:     Predict needle state
4:     Compute gripper update
5:     Update arm pose:
6:     if  then
7:         break
8:     end if
9:end for
Algorithm 1 Presentation Visual Servoing Policy
Fig. 6: Acquisition Stage Rollout: Rollout of the phase described in IV-A. Images are taken from the left camera and cropped to the gripper, with the circle observation projected in green. The pose goal is one in which the needle faces towards the camera. Note how uncertainty in the second image is resolved in later images as the needle reaches a more observable configuration.

V Physical Experiments

The experiments aim to answer the following question: how efficient and reliable is HOUSTON compared to baseline approaches? We also perform several ablation studies of method components in this section.

V-a Baselines

To evaluate the method for the task of active needle presentation, we compare to the following baselines:

  • Depth-based presentation: Instead of using the stereo RGB network to detect needle pose, we use a depth image-based detection algorithm to detect the needle pose and servo it to the flat grasping pose. This method takes the depth image from the built in depth calculation from the stereo camera, , as input, masks the gripper out of the depth image using the dVRK’s forward kinematics, then performs a volume crop around the end effector and fits a circle of known radius to the points using RANSAC to extract the state.

  • No-Sim-Data: This is an ablation of the RGB stereo segmentation network that is only trained on the small dataset of real data.

Fig. 7: Needles used: The needles are shown with a coin for scale. All models were trained using needle 1, while needles 2, 3 and 4 were also used for testing. Needles 1 and 3 have a radius of 1.25cm, and needles 2 and 4 have radii of 1.75cm and 0.75cm
Success Rate Conf. Int. Completion Time (s) Failures
Successes / Total Success Low High P X Y
Open Loop
Shared Grasp Policy
No Sim Data
Depth-Based Presentation
HOUSTON (Left to Right)
HOUSTON (Right to Left)
Needles Unseen in Training

 

HOUSTON (Right to Left, Needle 2)
HOUSTON (Right to Left, Needle 3)
HOUSTON (Right to Left, Needle 4)
TABLE I: Single Handover Physical Experiments: We report success rate, confidence intervals and durations taken over a grid search of the 28 start configurations described in Section III-C for the full surgical needle bimanual regrasping task. We report the frequency of three failure modes: (P) error in the presentation procedure, (X) error along the -axis and (Y) error along the -axis. HOUSTON significantly outperforms baselines, which either have many presentation failures or grasp positioning failures. The No-Sim-Data ablation also has a high success rate, but we find that the segmentation masks are qualitatively less accurate and have more false positives in the workspace. We present results with HOUSTON using three needles (Figure 7) that were unseen in training samples. HOUSTON performs best on the larger two needles (2 and 3), and performs worse on the smaller needle 4, where occlusions with the gripper are more severe.
Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Avg.
Num Time F Num Time F Num Time F Num Time F Num Time F Num T/H
Away config. Y P P P X
Towards config. Y P
TABLE II: Multiple Handover Physical Experiments: To evaluate the consistency of the HOUSTON, we evaluate it on the full multiple handover HOUSTON task with maximum handoffs . We run HOUSTON with two different needle orientations during the trial, and report the number of successful handovers (Num), time per handover (Time), and failure mode (F). HOUSTON averages 26.20 and 38.60 successful passes in each configuration, and has three runs with no failures.

To evaluate the design choices used in the fine-grained grasping policy, we compare to the following baselines:

  • Open Loop: Executes an open loop grasping motion to grasp the needle based only on needle geometry and inverse kinematics.

  • Shared Grasp Policy: Trains a single policy to output and displacements, and takes both and as input.

To evaluate whether the system can transfer to needles unseen in training, we evaluate HOUSTON on three additional needles as in Figure 7.

V-B Experimental Setup

We perform experiments using the daVinci Research Kit (dVRK) surgical robot [dvrk2014], a cable-driven surgical robot with two needle drivers, a gripper which can open 1cm. For perception, the setup includes a Zed Mini stereo camera angled slightly downwards to face the arms, and an overhead Zivid One Plus M camera facing directly down. Stereo images are captured at 2K resolution, and overhead images are captured at 1080p. Locations of the arms relative to each of the cameras is statically calibrated.

V-B1 Single handover

For single handover experiments, we manually vary the orientation of the gripper before each trial to the orientations described in III-C. A handoff is considered successful if the needle switches from one gripper to the other, and at the end is fully supported by the other gripper.

V-B2 Multiple handovers

For multiple handover experiments, we start the needle in the left gripper in a visible configuration to the camera, so that all errors are a result of handoffs rather than initialization. We evaluate two configurations: one where the needle arc ends facing the stereo camera in the grasping configuration (Towards), and one where it faces the opposite direction (Away). This configuration is typically maintained throughout each multi-handover trial because of the consistency of the needle presentation step.

V-C Single Handover Results

We evaluate HOUSTON and baselines on the single handover task in Table I, and perform multiple systematic passes over the starting configurations described in Section III-C. We find that HOUSTON is able to more reliably perform the task than comparisons, which either experience many presentation errors or many grasp positioning errors.

V-D Multiple Handover Results

We evaluate HOUSTON on the multiple handover task with with two different starting configurations (Table II). We observe that in the first configuration, the algorithm completes successful handovers on average and in the second. In three trials, no errors occur, and we manually terminate them after successful handovers.

V-E Failure Analysis

HOUSTON encounters three failure modes:

  • P: Presentation error: the robot fails to present the needle in an orientation that is in the plane of the table with the needle tip pointing toward the grasping arm. This may lead to grasping angles that are unreachable or out of the training distribution for the grasping arm.

  • X: Grasping positioning error (X): the -axis grasping policy fails to line up with the needle prior to executing the-axis grasping policy.

  • Y: Grasping positioning error (Y): the -axis grasping policy fails line up with the needle prior to grasping.

We categorize all of the failure modes encountered in Table I. We find that the open loop grasping policies are not able to consistently position well for grasping. HOUSTON has failures that are evenly distributed across the failure modes. Grasp policy servoing errors stem mainly from needle configurations that are far outside the distribution seen in training. Presentation phase failures stem primarily from mis-detection of the needle true tip, either because of incomplete segmentation masks or from drift in robot kinematics causing the most distal needle point to not be the tip. This causes the servoing policy to rotate the needle away from the camera, after which sometimes it loses visibility and fails to bring the needle to the pre-handover pose. Multi-handoff failures most frequently arise because of subtle imperfections in grasp execution where the needle rotates to a difficult angle. During the subsequent handover the needle can become obstructed by the holding gripper, inhibiting the grasping policy.

Vi Discussion

In this work we present HOUSTON, a problem and an algorithm for reliably completing the bimanual regrasping task on unpainted surgical needles. To our knowledge, this work is the first to study the unmodified variant of the regrasping task. The main limitations of this approach are its reliance on human demonstrations to learn the grasping policy, and sensitivity to needle and environment appearance. We hypothesize that the former could be mitigated via self-supervised demonstration collection, or by exploring unsupervised methods for fine-tuning behavior cloned policies. Future work will address the latter issue by exploring more powerful network architectures leveraging stereo disparity such as [kollar2021simnet], and designing more autonomous data collection techniques which can label real needle data without human input. In future work, we will also study how to reorient needles between handovers for precise control of needle-in-hand pose and attempt to make needle tracking more robust to occlusions from tissue phantoms.