Video-TransUNet: Temporally Blended Vision Transformer for CT VFSS Instance Segmentation

08/17/2022
by   Chengxi Zeng, et al.
17

We propose Video-TransUNet, a deep architecture for instance segmentation in medical CT videos constructed by integrating temporal feature blending into the TransUNet deep learning framework. In particular, our approach amalgamates strong frame representation via a ResNet CNN backbone, multi-frame feature blending via a Temporal Context Module (TCM), non-local attention via a Vision Transformer, and reconstructive capabilities for multiple targets via a UNet-based convolutional-deconvolutional architecture with multiple heads. We show that this new network design can significantly outperform other state-of-the-art systems when tested on the segmentation of bolus and pharynx/larynx in Videofluoroscopic Swallowing Study (VFSS) CT sequences. On our VFSS2022 dataset it achieves a dice coefficient of 0.8796 and an average surface distance of 1.0379 pixels. Note that tracking the pharyngeal bolus accurately is a particularly important application in clinical practice since it constitutes the primary method for diagnostics of swallowing impairment. Our findings suggest that the proposed model can indeed enhance the TransUNet architecture via exploiting temporal information and improving segmentation performance by a significant margin. We publish key source code, network weights, and ground truth annotations for simplified performance reproduction.

READ FULL TEXT

page 5

page 6

page 7

research
12/15/2021

SeqFormer: a Frustratingly Simple Model for Video Instance Segmentation

In this work, we present SeqFormer, a frustratingly simple model for vid...
research
02/22/2023

Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS Instance Segmentation

This paper presents a deep learning framework for medical video segmenta...
research
04/18/2022

Temporally Efficient Vision Transformer for Video Instance Segmentation

Recently vision transformer has achieved tremendous success on image-lev...
research
06/12/2021

1st Place Solution for YouTubeVOS Challenge 2021:Video Instance Segmentation

Video Instance Segmentation (VIS) is a multi-task problem performing det...
research
07/28/2021

Improving Video Instance Segmentation via Temporal Pyramid Routing

Video Instance Segmentation (VIS) is a new and inherently multi-task pro...
research
08/01/2018

Recurrent neural networks for aortic image sequence segmentation with sparse annotations

Segmentation of image sequences is an important task in medical image an...
research
08/20/2021

BlockCopy: High-Resolution Video Processing with Block-Sparse Feature Propagation and Online Policies

In this paper we propose BlockCopy, a scheme that accelerates pretrained...

Please sign up or login with your details

Forgot password? Click here to reset