Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS Instance Segmentation

02/22/2023
by   Chengxi Zeng, et al.
5

This paper presents a deep learning framework for medical video segmentation. Convolution neural network (CNN) and transformer-based methods have achieved great milestones in medical image segmentation tasks due to their incredible semantic feature encoding and global information comprehension abilities. However, most existing approaches ignore a salient aspect of medical video data - the temporal dimension. Our proposed framework explicitly extracts features from neighbouring frames across the temporal dimension and incorporates them with a temporal feature blender, which then tokenises the high-level spatio-temporal feature to form a strong global feature encoded via a Swin Transformer. The final segmentation results are produced via a UNet-like encoder-decoder architecture. Our model outperforms other approaches by a significant margin and improves the segmentation benchmarks on the VFSS2022 dataset, achieving a dice coefficient of 0.8986 and 0.8186 for the two datasets tested. Our studies also show the efficacy of the temporal feature blending scheme and cross-dataset transferability of learned capabilities. Code and models are fully available at https://github.com/SimonZeng7108/Video-SwinUNet.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/21/2022

SegTransVAE: Hybrid CNN – Transformer with Regularization for medical image segmentation

Current research on deep learning for medical image segmentation exposes...
research
08/17/2022

Video-TransUNet: Temporally Blended Vision Transformer for CT VFSS Instance Segmentation

We propose Video-TransUNet, a deep architecture for instance segmentatio...
research
03/24/2022

Video Instance Segmentation via Multi-scale Spatio-temporal Split Attention Transformer

State-of-the-art transformer-based video instance segmentation (VIS) app...
research
09/27/2022

Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding

Spatio-Temporal video grounding (STVG) focuses on retrieving the spatio-...
research
03/21/2023

3D Mitochondria Instance Segmentation with Spatio-Temporal Transformers

Accurate 3D mitochondria instance segmentation in electron microscopy (E...
research
08/10/2023

Temporally-Adaptive Models for Efficient Video Understanding

Spatial convolutions are extensively used in numerous deep video models....
research
08/26/2020

Making a Case for 3D Convolutions for Object Segmentation in Videos

The task of object segmentation in videos is usually accomplished by pro...

Please sign up or login with your details

Forgot password? Click here to reset