ConvFormer: Plug-and-Play CNN-Style Transformers for Improving Medical Image Segmentation

09/09/2023
by   Xian Lin, et al.
0

Transformers have been extensively studied in medical image segmentation to build pairwise long-range dependence. Yet, relatively limited well-annotated medical image data makes transformers struggle to extract diverse global features, resulting in attention collapse where attention maps become similar or even identical. Comparatively, convolutional neural networks (CNNs) have better convergence properties on small-scale training data but suffer from limited receptive fields. Existing works are dedicated to exploring the combinations of CNN and transformers while ignoring attention collapse, leaving the potential of transformers under-explored. In this paper, we propose to build CNN-style Transformers (ConvFormer) to promote better attention convergence and thus better segmentation performance. Specifically, ConvFormer consists of pooling, CNN-style self-attention (CSA), and convolutional feed-forward network (CFFN) corresponding to tokenization, self-attention, and feed-forward network in vanilla vision transformers. In contrast to positional embedding and tokenization, ConvFormer adopts 2D convolution and max-pooling for both position information preservation and feature size reduction. In this way, CSA takes 2D feature maps as inputs and establishes long-range dependency by constructing self-attention matrices as convolution kernels with adaptive sizes. Following CSA, 2D convolution is utilized for feature refinement through CFFN. Experimental results on multiple datasets demonstrate the effectiveness of ConvFormer working as a plug-and-play module for consistent performance improvement of transformer-based frameworks. Code is available at https://github.com/xianlin7/ConvFormer.

READ FULL TEXT
research
06/07/2023

TEC-Net: Vision Transformer Embrace Convolutional Neural Networks for Medical Image Segmentation

The hybrid architecture of convolution neural networks (CNN) and Transfo...
research
06/29/2022

The Lighter The Better: Rethinking Transformers in Medical Image Segmentation Through Adaptive Pruning

Vision transformers have recently set off a new wave in the field of med...
research
06/21/2022

Position-prior Clustering-based Self-attention Module for Knee Cartilage Segmentation

The morphological changes in knee cartilage (especially femoral and tibi...
research
06/06/2023

CiT-Net: Convolutional Neural Networks Hand in Hand with Vision Transformers for Medical Image Segmentation

The hybrid architecture of convolutional neural networks (CNNs) and Tran...
research
02/14/2022

How Do Vision Transformers Work?

The success of multi-head self-attentions (MSAs) for computer vision is ...
research
03/31/2022

ReSTR: Convolution-free Referring Image Segmentation Using Transformers

Referring image segmentation is an advanced semantic segmentation task w...
research
02/21/2023

LIT-Former: Linking In-plane and Through-plane Transformers for Simultaneous CT Image Denoising and Deblurring

This paper studies 3D low-dose computed tomography (CT) imaging. Althoug...

Please sign up or login with your details

Forgot password? Click here to reset