SurgMAE: Masked Autoencoders for Long Surgical Video Analysis

05/19/2023
by   Muhammad Abdullah Jamal, et al.
0

There has been a growing interest in using deep learning models for processing long surgical videos, in order to automatically detect clinical/operational activities and extract metrics that can enable workflow efficiency tools and applications. However, training such models require vast amounts of labeled data which is costly and not scalable. Recently, self-supervised learning has been explored in computer vision community to reduce the burden of the annotation cost. Masked autoencoders (MAE) got the attention in self-supervised paradigm for Vision Transformers (ViTs) by predicting the randomly masked regions given the visible patches of an image or a video clip, and have shown superior performance on benchmark datasets. However, the application of MAE in surgical data remains unexplored. In this paper, we first investigate whether MAE can learn transferrable representations in surgical video domain. We propose SurgMAE, which is a novel architecture with a masking strategy based on sampling high spatio-temporal tokens for MAE. We provide an empirical study of SurgMAE on two large scale long surgical video datasets, and find that our method outperforms several baselines in low data regime. We conduct extensive ablation studies to show the efficacy of our approach and also demonstrate it's superior performance on UCF-101 to prove it's generalizability in non-surgical datasets as well.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2022

Activity Detection in Long Surgical Videos using Spatio-Temporal Models

Automatic activity detection is an important component for developing te...
research
07/16/2022

Multi-Modal Unsupervised Pre-Training for Surgical Operating Room Workflow Analysis

Data-driven approaches to assist operating room (OR) workflow analysis d...
research
06/18/2018

Temporal coherence-based self-supervised learning for laparoscopic workflow analysis

In order to provide the right type of assistance at the right time, comp...
research
04/21/2020

LRTD: Long-Range Temporal Dependency based Active Learning for Surgical Workflow Recognition

Automatic surgical workflow recognition in video is an essentially funda...
research
02/24/2021

"Train one, Classify one, Teach one" – Cross-surgery transfer learning for surgical step recognition

Prior work demonstrated the ability of machine learning to automatically...
research
07/11/2023

CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery

Medical students and junior surgeons often rely on senior surgeons and s...

Please sign up or login with your details

Forgot password? Click here to reset