Multi-Compound Transformer for Accurate Biomedical Image Segmentation

06/28/2021
by   Yuanfeng Ji, et al.
0

The recent vision transformer(i.e.for image classification) learns non-local attentive interaction of different patch tokens. However, prior arts miss learning the cross-scale dependencies of different pixels, the semantic correspondence of different labels, and the consistency of the feature representations and semantic embeddings, which are critical for biomedical segmentation. In this paper, we tackle the above issues by proposing a unified transformer network, termed Multi-Compound Transformer (MCTrans), which incorporates rich feature learning and semantic structure mining into a unified framework. Specifically, MCTrans embeds the multi-scale convolutional features as a sequence of tokens and performs intra- and inter-scale self-attention, rather than single-scale attention in previous works. In addition, a learnable proxy embedding is also introduced to model semantic relationship and feature enhancement by using self-attention and cross-attention, respectively. MCTrans can be easily plugged into a UNet-like network and attains a significant improvement over the state-of-the-art methods in biomedical image segmentation in six standard benchmarks. For example, MCTrans outperforms UNet by 3.64 3.71 Kavirs, ISIC2018 dataset, respectively. Code is available at https://github.com/JiYuanFeng/MCTrans.

READ FULL TEXT
research
06/27/2022

Kernel Attention Transformer (KAT) for Histopathology Whole Slide Image Classification

Transformer has been widely used in histopathology whole slide image (WS...
research
06/01/2022

Dynamic Linear Transformer for 3D Biomedical Image Segmentation

Transformer-based neural networks have surpassed promising performance o...
research
06/02/2023

A Novel Vision Transformer with Residual in Self-attention for Biomedical Image Classification

Biomedical image classification requires capturing of bio-informatics ba...
research
06/11/2023

VPUFormer: Visual Prompt Unified Transformer for Interactive Image Segmentation

The integration of diverse visual prompts like clicks, scribbles, and bo...
research
03/09/2022

A Unified Transformer Framework for Group-based Segmentation: Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection

Humans tend to mine objects by learning from a group of images or severa...
research
05/08/2023

Vision Transformer Off-the-Shelf: A Surprising Baseline for Few-Shot Class-Agnostic Counting

Class-agnostic counting (CAC) aims to count objects of interest from a q...
research
03/26/2022

Semantic Segmentation by Early Region Proxy

Typical vision backbones manipulate structured features. As a compromise...

Please sign up or login with your details

Forgot password? Click here to reset