Optimizing Vision Transformers for Medical Image Segmentation and Few-Shot Domain Adaptation

10/14/2022
by   Qianying Liu, et al.
0

The adaptation of transformers to computer vision is not straightforward because the modelling of image contextual information results in quadratic computational complexity with relation to the input features. Most of existing methods require extensive pre-training on massive datasets such as ImageNet and therefore their application to fields such as healthcare is less effective. CNNs are the dominant architecture in computer vision tasks because convolutional filters can effectively model local dependencies and reduce drastically the parameters required. However, convolutional filters cannot handle more complex interactions, which are beyond a small neighbour of pixels. Furthermore, their weights are fixed after training and thus they do not take into consideration changes in the visual input. Inspired by recent work on hybrid visual transformers with convolutions and hierarchical transformers, we propose Convolutional Swin-Unet (CS-Unet) transformer blocks and optimise their settings with relation to patch embedding, projection, the feed-forward network, up sampling and skip connections. CS-Unet can be trained from scratch and inherits the superiority of convolutions in each feature process phase. It helps to encode precise spatial information and produce hierarchical representations that contribute to object concepts at various scales. Experiments show that CS-Unet without pre-training surpasses other state-of-the-art counterparts by large margins on two medical CT and MRI datasets with fewer parameters. In addition, two domain-adaptation experiments on optic disc and polyp image segmentation further prove that our method is highly generalizable and effectively bridges the domain gap between images from different sources.

READ FULL TEXT

page 11

page 12

research
02/26/2021

Convolution-Free Medical Image Segmentation using Transformers

Like other applications in computer vision, medical image segmentation h...
research
02/08/2023

Adapting Pre-trained Vision Transformers from 2D to 3D through Weight Inflation Improves Medical Image Segmentation

Given the prevalence of 3D medical imaging technologies such as MRI and ...
research
04/01/2022

UNetFormer: A Unified Vision Transformer Model and Pre-Training Framework for 3D Medical Image Segmentation

Vision Transformers (ViT)s have recently become popular due to their out...
research
02/28/2022

A Multi-scale Transformer for Medical Image Segmentation: Architectures, Model Efficiency, and Benchmarks

Transformers have emerged to be successful in a number of natural langua...
research
05/20/2022

Self-supervised 3D anatomy segmentation using self-distilled masked image transformer (SMIT)

Vision transformers, with their ability to more efficiently model long-r...
research
02/01/2023

Efficient Scopeformer: Towards Scalable and Rich Feature Extraction for Intracranial Hemorrhage Detection

The quality and richness of feature maps extracted by convolution neural...
research
04/26/2022

Deeper Insights into ViTs Robustness towards Common Corruptions

Recent literature have shown design strategies from Convolutions Neural ...

Please sign up or login with your details

Forgot password? Click here to reset