Revisiting Multi-Scale Feature Fusion for Semantic Segmentation

03/23/2022
by   Tianjian Meng, et al.
8

It is commonly believed that high internal resolution combined with expensive operations (e.g. atrous convolutions) are necessary for accurate semantic segmentation, resulting in slow speed and large memory usage. In this paper, we question this belief and demonstrate that neither high internal resolution nor atrous convolutions are necessary. Our intuition is that although segmentation is a dense per-pixel prediction task, the semantics of each pixel often depend on both nearby neighbors and far-away context; therefore, a more powerful multi-scale feature fusion network plays a critical role. Following this intuition, we revisit the conventional multi-scale feature space (typically capped at P5) and extend it to a much richer space, up to P9, where the smallest features are only 1/512 of the input size and thus have very large receptive fields. To process such a rich feature space, we leverage the recent BiFPN to fuse the multi-scale features. Based on these insights, we develop a simplified segmentation model, named ESeg, which has neither high internal resolution nor expensive atrous convolutions. Perhaps surprisingly, our simple method can achieve better accuracy with faster speed than prior art across multiple datasets. In real-time settings, ESeg-Lite-S achieves 76.0 CityScapes [12] at 189 FPS, outperforming FasterSeg [9] (73.1 FPS). Our ESeg-Lite-L runs at 79 FPS and achieves 80.1 the gap between real-time and high-performance segmentation models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2022

Boundary Corrected Multi-scale Fusion Network for Real-time Semantic Segmentation

Image semantic segmentation aims at the pixel-level classification of im...
research
03/23/2021

Dilated SpineNet for Semantic Segmentation

Scale-permuted networks have shown promising results on object bounding ...
research
06/08/2021

SpaceMeshLab: Spatial Context Memoization and Meshgrid Atrous Convolution Consensus for Semantic Segmentation

Semantic segmentation networks adopt transfer learning from image classi...
research
06/08/2021

CSRNet: Cascaded Selective Resolution Network for Real-time Semantic Segmentation

Real-time semantic segmentation has received considerable attention due ...
research
03/25/2022

Multi-scale and Cross-scale Contrastive Learning for Semantic Segmentation

This work considers supervised contrastive learning for semantic segment...
research
03/15/2022

Panoptic SwiftNet: Pyramidal Fusion for Real-time Panoptic Segmentation

Dense panoptic prediction is a key ingredient in many existing applicati...
research
08/30/2022

Probing Contextual Diversity for Dense Out-of-Distribution Detection

Detection of out-of-distribution (OoD) samples in the context of image c...

Please sign up or login with your details

Forgot password? Click here to reset