IncepFormer: Efficient Inception Transformer with Pyramid Pooling for Semantic Segmentation

by   Lihua Fu, et al.

Semantic segmentation usually benefits from global contexts, fine localisation information, multi-scale features, etc. To advance Transformer-based segmenters with these aspects, we present a simple yet powerful semantic segmentation architecture, termed as IncepFormer. IncepFormer has two critical contributions as following. First, it introduces a novel pyramid structured Transformer encoder which harvests global context and fine localisation features simultaneously. These features are concatenated and fed into a convolution layer for final per-pixel prediction. Second, IncepFormer integrates an Inception-like architecture with depth-wise convolutions, and a light-weight feed-forward module in each self-attention layer, efficiently obtaining rich local multi-scale object features. Extensive experiments on five benchmarks show that our IncepFormer is superior to state-of-the-art methods in both accuracy and speed, e.g., 1) our IncepFormer-S achieves 47.7 ADE20K which outperforms the existing best method by 1 parameters and fewer FLOPs. 2) Our IncepFormer-B finally achieves 82.0 Cityscapes dataset with 39.6M parameters. Code is


page 7

page 12

page 13


EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications

In the pursuit of achieving ever-increasing accuracy, large and complex ...

Pyramid Fusion Transformer for Semantic Segmentation

The recently proposed MaskFormer <cit.> gives a refreshed perspective on...

Efficient Self-Ensemble Framework for Semantic Segmentation

Ensemble of predictions is known to perform better than individual predi...

Coarse-to-Fine Feature Mining for Video Semantic Segmentation

The contextual information plays a core role in semantic segmentation. A...

Semantic Segmentation by Early Region Proxy

Typical vision backbones manipulate structured features. As a compromise...

CV 3315 Is All You Need : Semantic Segmentation Competition

This competition focus on Urban-Sense Segmentation based on the vehicle ...

Code Repositories


IncepFormer Official repo

view repo

Please sign up or login with your details

Forgot password? Click here to reset