IncepFormer: Efficient Inception Transformer with Pyramid Pooling for Semantic Segmentation

12/06/2022
by   Lihua Fu, et al.
0

Semantic segmentation usually benefits from global contexts, fine localisation information, multi-scale features, etc. To advance Transformer-based segmenters with these aspects, we present a simple yet powerful semantic segmentation architecture, termed as IncepFormer. IncepFormer has two critical contributions as following. First, it introduces a novel pyramid structured Transformer encoder which harvests global context and fine localisation features simultaneously. These features are concatenated and fed into a convolution layer for final per-pixel prediction. Second, IncepFormer integrates an Inception-like architecture with depth-wise convolutions, and a light-weight feed-forward module in each self-attention layer, efficiently obtaining rich local multi-scale object features. Extensive experiments on five benchmarks show that our IncepFormer is superior to state-of-the-art methods in both accuracy and speed, e.g., 1) our IncepFormer-S achieves 47.7 ADE20K which outperforms the existing best method by 1 parameters and fewer FLOPs. 2) Our IncepFormer-B finally achieves 82.0 Cityscapes dataset with 39.6M parameters. Code is available:github.com/shendu0321/IncepFormer.

READ FULL TEXT

page 7

page 12

page 13

research
01/05/2022

Lawin Transformer: Improving Semantic Segmentation Transformer with Multi-Scale Representations via Large Window Attention

Multi-scale representations are crucial for semantic segmentation. The c...
research
06/21/2022

EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications

In the pursuit of achieving ever-increasing accuracy, large and complex ...
research
01/11/2022

Pyramid Fusion Transformer for Semantic Segmentation

The recently proposed MaskFormer <cit.> gives a refreshed perspective on...
research
11/26/2021

Efficient Self-Ensemble Framework for Semantic Segmentation

Ensemble of predictions is known to perform better than individual predi...
research
04/07/2022

Coarse-to-Fine Feature Mining for Video Semantic Segmentation

The contextual information plays a core role in semantic segmentation. A...
research
03/26/2022

Semantic Segmentation by Early Region Proxy

Typical vision backbones manipulate structured features. As a compromise...
research
06/25/2022

CV 3315 Is All You Need : Semantic Segmentation Competition

This competition focus on Urban-Sense Segmentation based on the vehicle ...

Please sign up or login with your details

Forgot password? Click here to reset