Since the deployment of surgical robotic devices such as the da Vinci® Surgical System, efforts to automate surgical tasks have become a popular area of research 
. The automation of surgery has promised to help reduce surgeon fatigue and improve the procedural consistency between surgeries, and perhaps one day take over the surgery itself. Success in realizing surgical automation has accelerated in recent years, with improvements in available open-source platforms such as the da Vinci Research Toolkit (dVRK) and simulators , coupled with significant advances to data-driven controller synthesis. Successful demonstrations of automated tasks have included cutting [18, 27], suture needle hand-off , suturing [24, 30], and debridement removal . Recent developments in perception for surgical robotics helps bridge these autonomous policies in ideal scenarios to realistic, deformable and noisy tissue environments such as the SuPer frameworks [12, 15].
While progress in developing autonomous surgical tasks has leapt forward, a key area that has been given little attention are reactive policies to traumatic events, such as hemostasis. Hemostasis describes a state of the surgical field that is fulfilled when there is no site of active bleeding and the tissues are unobstructed by blood. The bleeding can originate from a visible or macroscopic blood vessel (artery or vein), or from the microvasculature and capillary network within tissues. Unlike previously automated tasks that occur in a more predictable cadence within a procedure, bleeding can be unpredictable which necessitates hemostasis maneuvers at any time during any surgery. Surgical manipulation of any type–suction, grasping, retraction, cutting, dissection–can immediately cause bleeding. Bleeding can also occur in a delayed manner, for example if a vessel is not definitively sealed, it can rupture spontaneously without direct contact. This work specifically addresses the problem of small/medium vessel rupture. Overall, there are four distinct stages in hemostasis of this scenario: (1) clearing the surgical field of blood; (2) identification of the bleeding source (vessel rupture); (3) grasping the identified vessel to temporarily stop bleeding; (4) closing the vessel rupture, usually with an energy device, clip, or suture. Each stage requires a complex set of manipulation skills as well as perception algorithms that make it non-trivial to implement.
In this paper, we describe an automated solution to the first task, clearing the surgical field, as shown in Fig. 1. This task involves first recognizing blood in a scene, then tracking blood flow temporally, and finally prescribing a real-time trajectory generation strategy that will intelligently control a suction tool to siphon the blood to efficiently suction it. To this end, we present the following novel contributions:
the first complete automated solution for clearing the surgical field of flowing blood from a ruptured vessel using a robotic suction tool,
a novel blood flow detection and tracking method by utilizing temporal visual information, and
a trajectory generation technique from blood regions in the image frame for a surgical suction tool to follow and clear the surgical field.
The blood flow detection and tracking method is tested within various simulated scenes as well as an real-life case involving a vessel rupture during thyroidectomy. The complete solution is tested in a lab setting with the da Vinci Research Kit (dVRK)  and a simulated surgical cavity for blood to flow through and collect in. The results from the experiments show the effectiveness of the blood flow tracking and surgical suction tool trajectory generation developed in this work.
Ii Related Works
Previous work on blood detection largely is from the context of Wireless Capsule Endoscopy (WCE) where image processing for detections is used to speed up clinician workflow 
. The typical approach to blood detection in WCE is to classify either on a pixel level or using patch-based methods. The feature space used for classification is either direct Red, Green, Blue (RGB)  channels or the transformed Hue, Saturation, Value (HSV) channels 
. To efficiently process these color spaces, techniques such as support vector machines
, chromium moments combined with binary pattern textures
, and neural networks[20, 6] have been demonstrated. While these methods are robust to small individual lesions, in a surgical scene there can be stains from previous ruptures and larger amounts of blood flow that make the problem of blood detection and, specifically, tracking, a more challenging and complex problem.
There has been previous research on robots interacting with liquids in the act of pouring . However, these methods cannot be applied to a surgical setting since they are limited to specific objects for pouring and capturing liquids. Schenck and Fox applied deep neural networks to detect fluids  that can be combined with differential fluid dynamics to reconstruct 3D information 
. This detection method however requires labelled real-world data which are challenging to collect in a surgical context. Yamaguchi and Atkeson instead used the heuristic of optical flow to detect moving fluid
. The work presented here also uses optical flow to detect blood flow. However, instead of the classical method used by Yamaguchi and Atkeson, we applied a deep learning technique for improved performance of optical flow estimation in a surgical environment and fused the detections temporally with a novel temporal filter.
An overview of the proposed algorithms for blood region estimation and trajectory generation for an autonomous suction tool on a surgical robot is provided in Algorithm 1. At a high level, the blood region is estimated by updating a probability map on the scene, which describes the probability of each pixel in the image frame being blood or not. From the probability map, the blood region is extracted. A trajectory is then generated for the suction tool to follow in order to clear the surgical field from blood. The trajectory is constrained to stay within the blood region to maximize the blood removed.
The above images are generated by optical flow estimation from an in-vivo surgical scene. The left image shows the vectors of estimated image motion from optical flow, and the right image is a normalized heatmap of their magnitudes. Notice that the magnitude of optical flow detects the regions of blood flow well, while the orientation gives inconsistent information about the flow.
Iii-a Detecting Flowing Blood in Image Frame
Optical flow is chosen to detect flowing blood because it extracts information about all moving objects in the scene. In a surgical scene, the main motion comes from surgical tools and flowing fluids. Another source of motion in robotic surgery comes from a moving camera, but for robotic surgery the camera remains stationary when work is being done in a scene and its position is reset only to change the field of view. We therefore consider only stationary scenes. To mask instrument motion from the scene, previously developed methods can be applied to effectively segment and remove pixels attributed to surgical tools from image [12, 15, 1].
To estimate optical flow, a pretrained convolutional neural network (CNN) is used. A deep learning strategy is used instead of traditional methods such as Lucas-Kanade  (used in previous work in robot pouring ) as traditional optical flow approaches utilize brightness constancy constraint assumption, and this assumption is not valid in endoscopic procedures due to irregular lighting. Meanwhile, the proposed architecture by Teney and Herbert  is able to extract motion from learned features that are invariant to textures, brightness, and contrast, which is ideal for detecting flowing blood from an endoscope.
Similar to previous work in robot pouring , we also found experimentally that the magnitude of optical flow is a good signal for detecting fluid motion while the orientation is not. An example of the processed image is shown in Fig. 2 comparing the RGB image to the amplitude map for optical flow.
Consider the magnitude of optical flow at pixel . Let
be the random variable describing the detection of blood at pixelat time . The detection is modelled where blood is detected, , if the magnitude of optical flow at pixel is greater than a threshold, . The inverse is also set, so no blood is detected, , if the magnitude at pixel is less than . Hence the probability model for these detections can be simply written as:
which describes an observation model for the hidden state .
Iii-B Temporal Filtering for Blood Region Detection
Although the magnitudes of optical flow provide a good initial estimate for blood detection, they are nevertheless noisy and require filtering. Therefore, a temporal filter is based on a Hidden Markov Model (HMM) is proposed to fuse independent measurements of the pixel labels over time. The HMM tracks the discrete states forusing the observation models in (1). Let the following be a transition probability for a pixel be
which models the probability that if a pixel is already blood it will continue being blood. In the case of blood vessel ruptures, this should be set close to 1 since the vessel rupture will not stop emitting blood until it has been closed. For the transition probabilities where the pixel is not blood at time , an additional parameter, , is introduced to the model:
where describes the state of the neighboring pixels of . This is modelled as the resulting Boolean-OR operation () on the states of the neighboring pixels:
The temporal filter is designed to estimate the posterior probability of the stateusing transition probabilities and observation models. This is done using a predict and update step after every detection. The predict step can be calculated as:
and the update step is computed:
However, the predict expression has the joint probability of and . Explicit estimation for this joint probability would be computationally intractable, so each pixel’s probability of being blood is approximated to be independent of all others at time . With this simplification, the predict step can be rewritten as:
and an expression can be found for using the inclusion-exclusion principle:
To find the region of blood on the image frame, a mask is generated where all pixels with a posterior probability greater than 0.5 is set to 1, and the rest are set to 0. Then dilation and erosion morphological operations are applied once to reduce noise on the mask. Finally, the largest connected region of the mask is considered the region with blood flowing if its size is greater than a threshold of . This threshold keeps a detection from occurring when there is no actual blood flowing.
Iii-C Trajectory Generation for Blood Suction
A start and end point must be decided to generate a trajectory for suction. The end point should be roughly near the location of the vessel rupture in order to continuously remove any newly released blood. Meanwhile, the starting point should be downstream of the flowing blood in order to effectively clear the surgical field when suctioning upstream towards the source. Therefore, a simple estimation for the start and end point is done based on the age of the pixels in the blood region. The pixel with the largest and smallest ages in the blood region are defined as the end and start points respectively. To ensure that the end point is not generated at the exact edge of the blood stream, the blood region is eroded before selecting it.
The trajectory generated from the start to end point should also maximize its ability to suction blood while moving upstream. Therefore, using standard minimum distance paths are not ideal as they would tend to plan towards the edges of the blood region rather than the center. To center the trajectory in the blood region, an additional clearance reward is given to the motion planner. The clearance reward is generated by iteratively eroding the blood region for a max of iterations. The pixels left in the eroded region are given an additional reward of at each iteration. The final trajectory in the image frame is then generated using Dijkstra’s algorithm where the path is constrained to stay within the blood region and the clearance reward is subtracted from the normal distance cost. An outline of this trajectory generation technique is shown in Algorithm 2, and an example is shown in Fig. 3. The trajectory is then executed if it is longer than a threshold . This threshold gives time for the start and end points to stabilize so an effective trajectory can be generated.
Iv Experiments and Results
The proposed blood flow detection and tracking method was evaluated on both simulated scenes and a live surgery involving a hemorrhage during a thyroidectomy. The complete automated suctioning solution was demonstrated on in a lab setting on a simulated surgical cavity and red fluid for blood. The following sections describe these experiments, the necessary implementation details, and results.
Iv-a Implementation Details
All subsequent experiments were ran on a computer with Intel® Core i9-7940X Processor and NVIDIA’s GeForce RTX 2080. The blood flow detection and trajectory generation algorithms were implemented in MATLAB. The CNN for optical flow estimation  is pre-trained on the Middlebury dataset  and uses image frames for input. The resolution of the optical flow estimation is 1/4 of the input frame resolution. The size of the probability map is set to the optical flow resolution. The threshold for detection, , region size, , maximum number of erosions for clearance, , and trajectory length, , are set to 0.45, 20, 4, and 30 respectively. The detection probability, , , are set to 0.95 and 0.2 respectively. The initial probability of a pixel being blood, and transition probabilities of a pixel being blood, , , , are set to 0.1, 0.98, 0.85, and 0.01 respectively. The neighbors for a pixel, , are set to just up, down, left, and right. The clearance reward per erosion, , is set to 0.2.
Iv-B Datasets to Evaluate Blood Region Detection
Two separate datasets were generated for this work to evaluate the proposed blood region detection algorithm. Both datasets have labelled ground-truth masks, , of the blood region. Performance is evaluated from these datasets using the Intersection over Union (IoU) metric:
where is a mask of the detected blood region from our proposed method, is the Boolean-AND operation and is the Boolean-OR operation.
Iv-B1 Simulated Scenes
Six simulated scenes of flowing blood are generated using Unity3D’s particle-based fluid dynamics (PBDs). The scenes are shown in Fig. 4. A total of 61 image frames were extracted per scene. The ground-truth mask, , of the blood region is generated by projecting the fluid particles onto a virtual camera’s image plane and applying Gaussian smoothing. The rendered image is set to 10951284 pixels.
Iv-B2 In-Vivo Surgical Scene
After the completion of a thyroidectomy conducted on a pig (UCSD IACUC S19130), a rupture occurred on the carotid artery. The video data from this incident is used to evaluate the blood flow detection and tracking algorithm in a similar manner to the simulated scenes. For ground-truth masks of the blood region, , 10 out of a total of 41 frames were manually annotated. The recorded image size was 480 by 640 pixels.
Iv-C Performance of Blood Region Detection
To show the effectiveness of the tracking algorithm, a comparison experiment was conducted where the blood flow region was simply set to be the pixels with optical flow magnitude greater than . The distribution of IoU results are shown in Fig. 5 for blood flow region detection with and without the tracking algorithm on the simulated scenes and in-vivo dataset. There is a clear difference in performance of the blood flow detection and tracking between the simulated scenes and in-vivo rupture. We believe this is due to the poorer lighting conditions and the reflective surgical clamps used in the in-vivo scene as seen in Fig. 6. Nonetheless, the blood region is successfully estimated when using the tracking algorithm despite the many red stains, hence highlighting the importance of using temporal information for detection rather than color features. For additional comparison, Lucas-Kanade’s  and Farnebeck’s  optical flow estimation techniques were replaced for the CNN based optical flow estimation . Note that Lucas-Kanade’s optical flow estimation is the proposed detection method for fluids by Yamaguchi and Atkeson used for robot pouring . The resulting IoU for the in-vivo and simulated scenes was measured to be consistently below 0.50 in both cases, which is substantially lower than our proposed detection technique.
Iv-D Automated Suction in Cavity
To evaluate the complete autonomous surgical task of recognizing blood flow and performing autonomous suction, a tissue phantom with a cavity for liquid to flow through is constructed from silicone, and water with red coloring dye is drained into the cavity using surgical tubing as shown in Fig. 7. A stereoscopic camera with a resolution of 10801920 pixels at 30fps on the dVRK  is used. The trajectory generated for suction is converted into 3D position commands using the Pyramid Stereo Matching Network (PSMNet) , which takes the stereo-images of the cameras and determines the depth of each pixel. PSMNet’s weight are provided by their original implementations without any task-specific fine-tuning; the maximum disparity is set to 192. PSMNet estimated depth using an image size of 480 by 680 pixels meanwhile the blood flow detection algorithm used a reduced image size of 120 by 160 pixels to improve its speed.
A Patient Side Manipulator (PSM) from the dVRK  was equipped with da Vinci® Si Suction Tool and followed the generated trajectory to clear the simulated surgical cavity from blood. To follow the trajectory, the position of the end-effector in the PSM base frame, , is iteratively set to:
where mm is the max step size, the operator gives the homogeneous representation of a point, and the direction, , is computed as
The camera to base transform, , is estimated in real time using our previous work  and is the 3D goal position generated by the trajectory and PSMNet. This controller is ran at 100Hz and is repeated until mm per target position, , from the generated trajectory. Meanwhile the orientation of the suction tip is set to always be in direction of gravity. The position, , and orientation of the end-effector is converted to joint angles using an analytical inverse kinematic solution. These joint angles are then regulated using dVRK .
To account for imperfections in the 3D depth estimation from PSMNet and surgical tool tracking to regulate the end-effector, the suction tool was commanded to oscillate in and out along the vertical axis an additional 5mm at every point on the trajectory. This probing behavior ensured that the tool always sucked up the blood and neither drifted above the blood nor penetrated and dragged tissue. The Robot Operation System (ROS) is used to encapsulate the image processing and robot trajectory tracking processes .
Roughly 50mL of liquid is injected using a syringe into the cavity at one of the four locations highlighted in Fig. 7. Before each trial, the end-effector is centered in the middle of the silicon mold as shown in Fig. 7. The percentage of liquid removed from the cavity, time to react to the injected fluid, the time to complete the trajectory were measured to evaluate the performance of the proposed automation method. The percentage of liquid removed was measured by weighing the silicon mold and syringe before and after each trial. Time to react refers to the time taken to detect the flowing blood and generate a trajectory (i.e., completing Algorithm 1) from the first moment that the injected blood is visible in the camera frame. To ensure consistency of the proposed automation method, the experiment is repeated ten times at each of the four injection spots.
The results from the total 40 trials of the automated suction experiment are shown in Table I and an example sequence is shown in Fig. 8. During the experiments, we noticed the liquid generally pooled near the bottom left injection point since it was slightly lower with respect to gravity than the rest of the cavity. This lead to shorter trajectories being generated, and hence less time to execute them as seen in the results, for the bottom left corner experiment compared to the others.
A similar set up is repeated for demonstration purposes and the result is shown in Fig. 1. The mold used for this demonstration was constructed using candle wax and has pig intestine embedded in it. Everything else is kept the same as the previously described. Despite changes to the visual textures and topology of the scene, we can see that the method is robust enough to perform autonomous tracking and suction of blood without modification.
|Injection Point||Liquid||Time to React||Time to Execute|
V Discussion and Conclusion
In this work, the first completely automated solution for clearing the surgical field from blood is presented. The solution provides both the perception, trajectory generation, and control strategy required for the task of clearing blood. This is the first step taken towards automation of a crucial surgical task, hemostasis, which can occur in any surgery at any time. To ensure robustness against blood stains, the algorithm relies on temporal information for detection. The novel blood flow detection and tracking algorithm presented offers a unique, probabilistic solution to tracking liquids over 3D cavities and channels, under noisy and harsh visibility conditions, and is a critical perceptual element. This estimation and tracking helps inform a trajectory generation technique to act upon the detected blood is also presented and uses a clearance reward to maximize the blood suctioned by the suction tool and be robust against imperfect blood region estimation.
For future work, we intend to push towards a complete solution for automation of hemostasis. To accomplish this, a more precise estimation of the source is needed which will be investigated by using a particle based motion model, similar to particle based physics simulators, in the temporal filtering. This should improve the blood flow tracking performance since it will have a more exact prediction step. In addition, the motion model can also be used as a forward prediction for model predictive control techniques so the suction tool can anticipate, rather than reacting, to the blood flow. Fine tissue manipulation policies that have not been explored, such as vessel grasping and clipping, will be investigated, likely leveraging open-sourced platforms like the da Vinci Reinforcement Learning (dVRL) toolkit.
F. Richter is supported by the NSF Graduate Research Fellowship. The authors would like to thank Intuitive Surgical Inc. for instrument donations, Jingpei Lu for his support with PSMNet, Harleen Singh for her support with the molds, Simon DiMiao, Omid Maherari, Dale Bergman, and Anton Deguet for their support with the dVRK.
-  (2019) 2017 robotic instrument segmentation challenge. arXiv preprint arXiv:1902.06426. Cited by: §III-A.
A database and evaluation methodology for optical flow.
International journal of computer vision92 (1), pp. 1–31. Cited by: §IV-A.
Pyramid stereo matching network.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5418. Cited by: §IV-D.
-  (2018) Automated pick-up of suturing needles for robotic surgical assistance. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1370–1377. Cited by: §I.
-  (2003) Two-frame motion estimation based on polynomial expansion. In Scandinavian conference on Image analysis, pp. 363–370. Cited by: §IV-C.
-  (2011) Bleeding region detection in wce images based on color features and neural network. In 2011 IEEE 54th International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 1–4. Cited by: §II.
-  (2013) Computer-aided bleeding detection in wce video. IEEE journal of biomedical and health informatics 18 (2), pp. 636–642. Cited by: §II.
-  (2008) Active blood detection in a high resolution capsule endoscopy using color spectrum transformation. In 2008 International Conference on BioMedical Engineering and Informatics, Vol. 1, pp. 859–862. Cited by: §II.
-  (2014) An open-source research kit for the da vinci® surgical system. In 2014 IEEE international conference on robotics and automation (ICRA), pp. 6434–6439. Cited by: §I, §I, §IV-D, §IV-D.
-  (2014) Autonomous multilateral debridement with the raven surgical robot. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 1432–1439. Cited by: §I.
-  (2009) Computer-aided detection of bleeding regions for capsule endoscopy images. IEEE Transactions on biomedical engineering 56 (4), pp. 1032–1039. Cited by: §II.
-  (2020) SuPer: a surgical perception framework for endoscopic tissue manipulation with surgical robotics. IEEE Robotics and Automation Letters 5 (2), pp. 2294–2301. Cited by: §I, §III-A, §IV-D.
-  (2011) Computer-aided decision support systems for endoscopy in the gastrointestinal tract: a review. IEEE reviews in biomedical engineering 4, pp. 73–88. Cited by: §II.
-  (2009) Obscure bleeding detection in endoscopy images using support vector machines. Optimization and engineering 10 (2), pp. 289–299. Cited by: §II.
SuPer deep: a surgical perception framework for robotic tissue manipulation using deep learning for feature extraction. arXiv preprint arXiv:2003.03472. Cited by: §I, §III-A.
-  (1981) An iterative image registration technique with an application to stereo vision. Cited by: §III-A, §IV-C.
-  (2017) See the glass half full: reasoning about liquid containers, their volume and content. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1871–1880. Cited by: §II.
-  (2015) Learning by observation for surgical subtasks: multilateral cutting of 3d viscoelastic and 2d orthotropic tissue phantoms. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1202–1209. Cited by: §I.
-  (2019) Real-time identification of blood regions for hemostasis support in laparoscopic surgery. Signal, Image and Video Processing 13 (2), pp. 405–412. Cited by: §II.
-  (2011) Bleeding detection in wireless capsule endoscopy based on probabilistic neural network. Journal of medical systems 35 (6), pp. 1477–1484. Cited by: §II.
-  (2019) Open-sourced reinforcement learning environments for surgical robotics. arXiv preprint arXiv:1903.02090. Cited by: §I, §V.
-  (2018) Perceiving and reasoning about liquids using fully convolutional networks. The International Journal of Robotics Research 37 (4-5), pp. 452–471. Cited by: §II.
-  (2018) Spnets: differentiable fluid dynamics for deep neural networks. arXiv preprint arXiv:1806.06094. Cited by: §II.
-  (2016) Automating multi-throw multilateral surgical suturing with a mechanical needle guide and sequential convex optimization. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 4178–4185. Cited by: §I.
-  Robotic operating system External Links: Cited by: §IV-D.
-  (2016) Learning to extract motion from videos in convolutional neural networks. In Asian Conference on Computer Vision, pp. 412–428. Cited by: §III-A, §IV-A, §IV-C.
-  (2017) Multilateral surgical pattern cutting in 2d orthotropic gauze with deep reinforcement learning policies for tensioning. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 2371–2378. Cited by: §I.
-  (2016) Stereo vision of liquid and particle flow for robot pouring. In 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids), pp. 1173–1180. Cited by: §II, §III-A, §III-A, §IV-C.
-  (2017) Robot autonomy for surgery. In Encyclopedia of Medical Robotics, pp. 281–313. External Links: Cited by: §I.
-  (2019) Dual-arm robotic needle insertion with active tissue deformation for autonomous suturing. IEEE Robotics and Automation Letters 4 (3), pp. 2669–2676. Cited by: §I.