A Multi-scale Transformer for Medical Image Segmentation: Architectures, Model Efficiency, and Benchmarks

02/28/2022
by   Yunhe Gao, et al.
43

Transformers have emerged to be successful in a number of natural language processing and vision tasks, but their potential applications to medical imaging remain largely unexplored due to the unique difficulties of this field. In this study, we present UTNetV2, a simple yet powerful backbone model that combines the strengths of the convolutional neural network and Transformer for enhancing performance and efficiency in medical image segmentation. The critical design of UTNetV2 includes three innovations: (1) We used a hybrid hierarchical architecture by introducing depthwise separable convolution to projection and feed-forward network in the Transformer block, which brings local relationship modeling and desirable properties of CNNs (translation invariance) to Transformer, thus eliminate the requirement of large-scale pre-training. (2) We proposed efficient bidirectional attention (B-MHA) that reduces the quadratic computation complexity of self-attention to linear by introducing an adaptively updated semantic map. The efficient attention makes it possible to capture long-range relationship and correct the fine-grained errors in high-resolution token maps. (3) The semantic maps in the B-MHA allow us to perform semantically and spatially global multi-scale feature fusion without introducing much computational overhead. Furthermore, we provide a fair comparison codebase of CNN-based and Transformer-based on various medical image segmentation tasks to evaluate the merits and defects of both architectures. UTNetV2 demonstrated state-of-the-art performance across various settings, including large-scale datasets, small-scale datasets, 2D and 3D settings.

READ FULL TEXT

page 1

page 10

page 11

research
07/02/2021

UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation

Transformer architecture has emerged to be successful in a number of nat...
research
03/15/2022

HUMUS-Net: Hybrid unrolled multi-scale network architecture for accelerated MRI reconstruction

In accelerated MRI reconstruction, the anatomy of a patient is recovered...
research
09/15/2021

MISSFormer: An Effective Medical Image Segmentation Transformer

The CNN-based methods have achieved impressive results in medical image ...
research
03/29/2023

Multi-scale Hierarchical Vision Transformer with Cascaded Attention Decoding for Medical Image Segmentation

Transformers have shown great success in medical image segmentation. How...
research
11/13/2022

Learning from partially labeled data for multi-organ and tumor segmentation

Medical image benchmarks for the segmentation of organs and tumors suffe...
research
10/14/2022

Optimizing Vision Transformers for Medical Image Segmentation and Few-Shot Domain Adaptation

The adaptation of transformers to computer vision is not straightforward...
research
12/21/2022

Investigation of Network Architecture for Multimodal Head-and-Neck Tumor Segmentation

Inspired by the recent success of Transformers for Natural Language Proc...

Please sign up or login with your details

Forgot password? Click here to reset