A Volumetric Transformer for Accurate 3D Tumor Segmentation

by   Himashi Peiris, et al.
Monash University

This paper presents a Transformer architecture for volumetric medical image segmentation. Designing a computationally efficient Transformer architecture for volumetric segmentation is a challenging task. It requires keeping a complex balance in encoding local and global spatial cues, and preserving information along all axes of the volumetric data. The proposed volumetric Transformer has a U-shaped encoder-decoder design that processes the input voxels in their entirety. Our encoder has two consecutive self-attention layers to simultaneously encode local and global cues, and our decoder has novel parallel shifted window based self and cross attention blocks to capture fine details for boundary refinement by subsuming Fourier position encoding. Our proposed design choices result in a computationally efficient architecture, which demonstrates promising results on Brain Tumor Segmentation (BraTS) 2021, and Medical Segmentation Decathlon (Pancreas and Liver) datasets for tumor segmentation. We further show that the representations learned by our model transfer better across-datasets and are robust against data corruptions. \href{https://github.com/himashi92/VT-UNet}{Our code implementation is publicly available}.


page 7

page 12

page 18

page 19

page 20


SwinVFTR: A Novel Volumetric Feature-learning Transformer for 3D OCT Fluid Segmentation

Accurately segmenting fluid in 3D volumetric optical coherence tomograph...

A Convolutional-Transformer Network for Crack Segmentation with Boundary Awareness

Cracks play a crucial role in assessing the safety and durability of man...

Focal-UNet: UNet-like Focal Modulation for Medical Image Segmentation

Recently, many attempts have been made to construct a transformer base U...

Volumetric Fast Fourier Convolution for Detecting Ink on the Carbonized Herculaneum Papyri

Recent advancements in Digital Document Restoration (DDR) have led to si...

Real-Time Target Sound Extraction

We present the first neural network model to achieve real-time and strea...

Masked Autoencoders with Multi-Window Attention Are Better Audio Learners

Several recent works have adapted Masked Autoencoders (MAEs) for learnin...

Please sign up or login with your details

Forgot password? Click here to reset