SSformer: A Lightweight Transformer for Semantic Segmentation

08/03/2022
by   Wentao Shi, et al.
17

It is well believed that Transformer performs better in semantic segmentation compared to convolutional neural networks. Nevertheless, the original Vision Transformer may lack of inductive biases of local neighborhoods and possess a high time complexity. Recently, Swin Transformer sets a new record in various vision tasks by using hierarchical architecture and shifted windows while being more efficient. However, as Swin Transformer is specifically designed for image classification, it may achieve suboptimal performance on dense prediction-based segmentation task. Further, simply combing Swin Transformer with existing methods would lead to the boost of model size and parameters for the final segmentation model. In this paper, we rethink the Swin Transformer for semantic segmentation, and design a lightweight yet effective transformer model, called SSformer. In this model, considering the inherent hierarchical design of Swin Transformer, we propose a decoder to aggregate information from different layers, thus obtaining both local and global attentions. Experimental results show the proposed SSformer yields comparable mIoU performance with state-of-the-art models, while maintaining a smaller model size and lower compute.

READ FULL TEXT

page 2

page 3

page 4

research
09/18/2021

Efficient Hybrid Transformer: Learning Global-local Context for Urban Sence Segmentation

Semantic segmentation of fine-resolution urban scene images plays a vita...
research
05/12/2021

Segmenter: Transformer for Semantic Segmentation

Image segmentation is often ambiguous at the level of individual image p...
research
08/25/2023

A Re-Parameterized Vision Transformer (ReVT) for Domain-Generalized Semantic Segmentation

The task of semantic segmentation requires a model to assign semantic la...
research
07/11/2023

A Hierarchical Transformer Encoder to Improve Entire Neoplasm Segmentation on Whole Slide Image of Hepatocellular Carcinoma

In digital histopathology, entire neoplasm segmentation on Whole Slide I...
research
11/18/2021

Dynamically pruning segformer for efficient semantic segmentation

As one of the successful Transformer-based models in computer vision tas...
research
03/31/2023

Rethinking Local Perception in Lightweight Vision Transformer

Vision Transformers (ViTs) have been shown to be effective in various vi...
research
01/26/2023

Semantic Segmentation Enhanced Transformer Model for Human Attention Prediction

Saliency Prediction aims to predict the attention distribution of human ...

Please sign up or login with your details

Forgot password? Click here to reset