3D Shuffle-Mixer: An Efficient Context-Aware Vision Learner of Transformer-MLP Paradigm for Dense Prediction in Medical Volume

04/14/2022
by   Jianye Pang, et al.
0

Dense prediction in medical volume provides enriched guidance for clinical analysis. CNN backbones have met bottleneck due to lack of long-range dependencies and global context modeling power. Recent works proposed to combine vision transformer with CNN, due to its strong global capture ability and learning capability. However, most works are limited to simply applying pure transformer with several fatal flaws (i.e., lack of inductive bias, heavy computation and little consideration for 3D data). Therefore, designing an elegant and efficient vision transformer learner for dense prediction in medical volume is promising and challenging. In this paper, we propose a novel 3D Shuffle-Mixer network of a new Local Vision Transformer-MLP paradigm for medical dense prediction. In our network, a local vision transformer block is utilized to shuffle and learn spatial context from full-view slices of rearranged volume, a residual axial-MLP is designed to mix and capture remaining volume context in a slice-aware manner, and a MLP view aggregator is employed to project the learned full-view rich context to the volume feature in a view-aware manner. Moreover, an Adaptive Scaled Enhanced Shortcut is proposed for local vision transformer to enhance feature along spatial and channel dimensions adaptively, and a CrossMerge is proposed to skip-connects the multi-scale feature appropriately in the pyramid architecture. Extensive experiments demonstrate the proposed model outperforms other state-of-the-art medical dense prediction methods.

READ FULL TEXT

page 1

page 4

page 7

page 8

research
09/15/2021

MISSFormer: An Effective Medical Image Segmentation Transformer

The CNN-based methods have achieved impressive results in medical image ...
research
11/15/2022

ConvFormer: Combining CNN and Transformer for Medical Image Segmentation

Convolutional neural network (CNN) based methods have achieved great suc...
research
08/10/2022

Ghost-free High Dynamic Range Imaging with Context-aware Transformer

High dynamic range (HDR) deghosting algorithms aim to generate ghost-fre...
research
04/25/2023

STM-UNet: An Efficient U-shaped Architecture Based on Swin Transformer and Multi-scale MLP for Medical Image Segmentation

Automated medical image segmentation can assist doctors to diagnose fast...
research
07/12/2023

UGCANet: A Unified Global Context-Aware Transformer-based Network with Feature Alignment for Endoscopic Image Analysis

Gastrointestinal endoscopy is a medical procedure that utilizes a flexib...
research
06/08/2023

Genomic Interpreter: A Hierarchical Genomic Deep Neural Network with 1D Shifted Window Transformer

Given the increasing volume and quality of genomics data, extracting new...
research
07/01/2019

Global Transformer U-Nets for Label-Free Prediction of Fluorescence Images

Visualizing the details of different cellular structures is of great imp...

Please sign up or login with your details

Forgot password? Click here to reset