GLSFormer: Gated - Long, Short Sequence Transformer for Step Recognition in Surgical Videos

07/20/2023
by   Nisarg A. Shah, et al.
0

Automated surgical step recognition is an important task that can significantly improve patient safety and decision-making during surgeries. Existing state-of-the-art methods for surgical step recognition either rely on separate, multi-stage modeling of spatial and temporal information or operate on short-range temporal resolution when learned jointly. However, the benefits of joint modeling of spatio-temporal features and long-range information are not taken in account. In this paper, we propose a vision transformer-based approach to jointly learn spatio-temporal features directly from sequence of frame-level patches. Our method incorporates a gated-temporal attention mechanism that intelligently combines short-term and long-term spatio-temporal feature representations. We extensively evaluate our approach on two cataract surgery video datasets, namely Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods. These results validate the suitability of our proposed approach for automated surgical step recognition. Our code is released at: https://github.com/nisargshah1999/GLSFormer

READ FULL TEXT
research
11/01/2021

LSTA-Net: Long short-term Spatio-Temporal Aggregation Network for Skeleton-based Action Recognition

Modelling various spatio-temporal dependencies is the key to recognising...
research
07/13/2023

Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition

Recent video recognition models utilize Transformer models for long-rang...
research
03/30/2021

Temporal Memory Relation Network for Workflow Recognition from Surgical Video

Automatic surgical workflow recognition is a key component for developin...
research
03/17/2021

Trans-SVNet: Accurate Phase Recognition from Surgical Videos via Hybrid Embedding Aggregation Transformer

Real-time surgical phase recognition is a fundamental task in modern ope...
research
08/25/2022

Spatio-Temporal Representation Learning Enhanced Source Cell-phone Recognition from Speech Recordings

The existing source cell-phone recognition method lacks the long-term fe...
research
09/01/2020

Aggregating Long-Term Context for Learning Surgical Workflows

Analyzing surgical workflow is crucial for computers to understand surge...
research
09/04/2022

Hierarchical Transformer with Spatio-Temporal Context Aggregation for Next Point-of-Interest Recommendation

Next point-of-interest (POI) recommendation is a critical task in locati...

Please sign up or login with your details

Forgot password? Click here to reset