ESAD: Endoscopic Surgeon Action Detection Dataset

by   Vivek Singh Bawa, et al.

In this work, we take aim towards increasing the effectiveness of surgical assistant robots. We intended to make assistant robots safer by making them aware about the actions of surgeon, so it can take appropriate assisting actions. In other words, we aim to solve the problem of surgeon action detection in endoscopic videos. To this, we introduce a challenging dataset for surgeon action detection in real-world endoscopic videos. Action classes are picked based on the feedback of surgeons and annotated by medical professional. Given a video frame, we draw bounding box around surgical tool which is performing action and label it with action label. Finally, we presenta frame-level action detection baseline model based on recent advances in ob-ject detection. Results on our new dataset show that our presented dataset provides enough interesting challenges for future method and it can serveas strong benchmark corresponding research in surgeon action detection in endoscopic videos.


page 5

page 7


The SARAS Endoscopic Surgeon Action Detection (ESAD) dataset: Challenges and methods

For an autonomous robotic system, monitoring surgeon actions and assisti...

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

We introduce UCF101 which is currently the largest dataset of human acti...

Detecting the Starting Frame of Actions in Video

To understand causal relationships between events in the world, it is us...

Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action Detection

Despite significant progress in the development of human action detectio...

Untrimmed Action Anticipation

Egocentric action anticipation consists in predicting a future action th...

Online Action Detection

In online action detection, the goal is to detect the start of an action...

Hand Action Detection from Ego-centric Depth Sequences with Error-correcting Hough Transform

Detecting hand actions from ego-centric depth sequences is a practically...

1 Introduction

Why do we need such dataset?

Minimally Invasive Surgery (MIS) is a very sensitive medical procedure. A general MIS surgical procedure involves two surgeons: main surgeon and assistant surgeon. Success of any MIS procedure depends upon multiple factors, such as, attentiveness of main surgeon and assistant surgeon, competence of surgeons, effective coordination between the main surgeon and assistant surgeon. whose success depends on the competence of the human surgeons and the degree of effectiveness of their coordination.

According to Lancet Commission, each year 4.2 million people die within 30 days of surgery [nepogodiev2019global]. Another study at John Hopkins University states that 10% of total deaths in USA are due to medical error [jhustudy].

Artificial Intelligence is being used in a lot of applications where human error has to be mitigated. The proposed dataset is also one step in same direction. To make the surgical procedure safe, we should be able to identify and track the actions of main as well as assistant surgeon. This dataset is developed with the assistance of medical professionals as well as expert surgeon. More details of the data set can be found in section 4.

How this is going to be helpful to push the research? Although there a lot of datsets for action detection for action detection. But there is no existing dataset for action detection in medical computer vision. Given the complexity of the scene and difficulty in the detection of surgeon action, this dataset will set forward a path and benchmark for the medical computer vision research community. In our experiments, we found that it is very difficult to correctly localise the bounding box for any action, more discussion on this is provided in section 3.

Briefly, How do we create it?

Resulting main contributions?

2 Related work

Related endoscopic vision works?

Related endoscopic imaging datasets?

Action detection works and datasets?

3 Problem statement

Problems based data images data, localisation type of problem: no specific boundaries for action bbox

4 ESAD Dataset

The proposed dataset specifically focuses on prostatectomy procedure. We recorded four full prostatectomy procedures with the concent of the patients. In second stage we formalised the number of actions that a surgeon can perform during prostatectomy. After the thorough analysis we finalised 21 set of actions. List of actions along with number of samples is given in table 1.

The complete dataset is divided into three different sets: training, validation and test set. Training dataset has two complete prostatectomyprocedures. ESAD has 18793 annotated frames for training with a total of 27998 action instances. Class-wise distribution of samples is given in table 1. Validation data has 4576 annotated frames with 7120 action instances and the test set is comprised of 6088 annotated frames with 11207 action instances.

Instead of randomly putting samples into each of the datasets, we use complete surgeries as one set. Reason behind the choice is that we don’t want either of the sets to be biased toward one class. Secondly, choosing whole procedure as one set provides the natural rate of sample occurrence during the real procedure. As we can see in table 1, some classes have a lot more samples than the other.

Label Train Val Test
CuttingMesocolon 315 179 188
PullingVasDeferens 457 245 113
ClippingVasDeferens 33 25 48
CuttingVasDeferens 71 22 36
ClippingTissue 215 44 15
PullingSeminalVesicle 2712 342 436
ClippingSeminalVesicle 118 35 33
CuttingSeminalVesicle 2509 196 307
SuckingBlood 3753 575 1696
SuckingSmoke 381 238 771
PullingTissue 4877 2177 2024
CuttingTissue 3715 1777 2055
BaggingProstate 34 5 37
BladderNeckDissection 1621 283 519
BladderAnastomosis 3585 298 1828
PullingProstate 958 12 451
ClippingBladderNeck 151 24 18
CuttingThread 108 22 40
UrethraDissection 351 56 439
CuttingProstate 1845 56 48
PullingBladderNeck 189 509 105
Table 1: List of actions for ESAD dataset with number of samples for training, validation and test.

Some samples from the dataset are shown in figure.

[] [] [] []

Figure 1: Target output semantic segmentation map for the images from the endoscope during prostatectomy procedure

5 Baseline Models

6 Experiments

6.1 Evaluation metrics

6.2 Frame-level Action Detection

6.3 Video-level Action Detection

7 Discussion

8 Conclusion