Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene Segmentation

03/29/2022
by   Yueming Jin, et al.
13

Automatic surgical scene segmentation is fundamental for facilitating cognitive intelligence in the modern operating theatre. Previous works rely on conventional aggregation modules (e.g., dilated convolution, convolutional LSTM), which only make use of the local context. In this paper, we propose a novel framework STswinCL that explores the complementary intra- and inter-video relations to boost segmentation performance, by progressively capturing the global context. We firstly develop a hierarchy Transformer to capture intra-video relation that includes richer spatial and temporal cues from neighbor pixels and previous frames. A joint space-time window shift scheme is proposed to efficiently aggregate these two cues into each pixel embedding. Then, we explore inter-video relation via pixel-to-pixel contrastive learning, which well structures the global embedding space. A multi-source contrast training objective is developed to group the pixel embeddings across videos with the ground-truth guidance, which is crucial for learning the global property of the whole data. We extensively validate our approach on two public surgical video benchmarks, including EndoVis18 Challenge and CaDIS dataset. Experimental results demonstrate the promising performance of our method, which consistently exceeds previous state-of-the-art approaches. Code will be available at https://github.com/YuemingJin/STswinCL.

READ FULL TEXT

page 1

page 3

page 9

research
03/17/2021

Trans-SVNet: Accurate Phase Recognition from Surgical Videos via Hybrid Embedding Aggregation Transformer

Real-time surgical phase recognition is a fundamental task in modern ope...
research
07/20/2022

Pseudo-label Guided Cross-video Pixel Contrast for Robotic Surgical Scene Segmentation with Limited Annotations

Surgical scene segmentation is fundamentally crucial for prompting cogni...
research
12/24/2022

MURPHY: Relations Matter in Surgical Workflow Analysis

Autonomous robotic surgery has advanced significantly based on analysis ...
research
07/27/2023

Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures

Recent advancements in surgical computer vision applications have been d...
research
10/03/2016

Video Pixel Networks

We propose a probabilistic video model, the Video Pixel Network (VPN), t...
research
09/25/2021

ReCal-Net: Joint Region-Channel-Wise Calibrated Network for Semantic Segmentation in Cataract Surgery Videos

Semantic segmentation in surgical videos is a prerequisite for a broad r...
research
12/06/2021

Separated Contrastive Learning for Organ-at-Risk and Gross-Tumor-Volume Segmentation with Limited Annotation

Automatic delineation of organ-at-risk (OAR) and gross-tumor-volume (GTV...

Please sign up or login with your details

Forgot password? Click here to reset