Global-Reasoned Multi-Task Learning Model for Surgical Scene Understanding

01/28/2022
by   Lalithkumar Seenivasan, et al.
0

Global and local relational reasoning enable scene understanding models to perform human-like scene analysis and understanding. Scene understanding enables better semantic segmentation and object-to-object interaction detection. In the medical domain, a robust surgical scene understanding model allows the automation of surgical skill evaluation, real-time monitoring of surgeon's performance and post-surgical analysis. This paper introduces a globally-reasoned multi-task surgical scene understanding model capable of performing instrument segmentation and tool-tissue interaction detection. Here, we incorporate global relational reasoning in the latent interaction space and introduce multi-scale local (neighborhood) reasoning in the coordinate space to improve segmentation. Utilizing the multi-task model setup, the performance of the visual-semantic graph attention network in interaction detection is further enhanced through global reasoning. The global interaction space features from the segmentation module are introduced into the graph network, allowing it to detect interactions based on both node-to-node and global interaction reasoning. Our model reduces the computation cost compared to running two independent single-task models by sharing common modules, which is indispensable for practical applications. Using a sequential optimization technique, the proposed multi-task model outperforms other state-of-the-art single-task models on the MICCAI endoscopic vision challenge 2018 dataset. Additionally, we also observe the performance of the multi-task model when trained using the knowledge distillation technique. The official code implementation is made available in GitHub.

READ FULL TEXT

page 1

page 3

page 4

page 5

research
03/10/2020

AP-MTL: Attention Pruned Multi-task Learning Model for Real-time Instrument Detection and Segmentation in Robot-assisted Surgery

Surgical scene understanding and multi-tasking learning are crucial for ...
research
06/08/2023

InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene Understanding

Multi-task scene understanding aims to design models that can simultaneo...
research
11/28/2022

Task-Aware Asynchronous Multi-Task Model with Class Incremental Contrastive Learning for Surgical Scene Understanding

Purpose: Surgery scene understanding with tool-tissue interaction recogn...
research
03/22/2023

Self-distillation for surgical action recognition

Surgical scene understanding is a key prerequisite for contextaware deci...
research
07/07/2020

Learning and Reasoning with the Graph Structure Representation in Robotic Surgery

Learning to infer graph representations and performing spatial reasoning...
research
03/22/2022

4D-OR: Semantic Scene Graphs for OR Domain Modeling

Surgical procedures are conducted in highly complex operating rooms (OR)...
research
03/03/2021

Arthroscopic Multi-Spectral Scene Segmentation Using Deep Learning

Knee arthroscopy is a minimally invasive surgical (MIS) procedure which ...

Please sign up or login with your details

Forgot password? Click here to reset