Transformer Meets Convolution: A Bilateral Awareness Net-work for Semantic Segmentation of Very Fine Resolution Ur-ban Scene Images

by   Libo Wang, et al.

Semantic segmentation from very fine resolution (VFR) urban scene images plays a significant role in several application scenarios including autonomous driving, land cover classification, and urban planning, etc. However, the tremendous details contained in the VFR image severely limit the potential of the existing deep learning approaches. More seriously, the considerable variations in scale and appearance of objects further deteriorate the representational capacity of those se-mantic segmentation methods, leading to the confusion of adjacent objects. Addressing such is-sues represents a promising research field in the remote sensing community, which paves the way for scene-level landscape pattern analysis and decision making. In this manuscript, we pro-pose a bilateral awareness network (BANet) which contains a dependency path and a texture path to fully capture the long-range relationships and fine-grained details in VFR images. Specif-ically, the dependency path is conducted based on the ResT, a novel Transformer backbone with memory-efficient multi-head self-attention, while the texture path is built on the stacked convo-lution operation. Besides, using the linear attention mechanism, a feature aggregation module (FAM) is designed to effectively fuse the dependency features and texture features. Extensive experiments conducted on the three large-scale urban scene image segmentation datasets, i.e., ISPRS Vaihingen dataset, ISPRS Potsdam dataset, and UAVid dataset, demonstrate the effective-ness of our BANet. Specifically, a 64.6 mIoU is achieved on the UAVid dataset.


page 3

page 5

page 8

page 9

page 11

page 12

page 13

page 14


Looking Outside the Window: Wider-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images

Long-range context information is crucial for the semantic segmentation ...

Feature Pyramid Network with Multi-Head Attention for Semantic Segmentation of Fine-Resolution Remotely Sensed Images

Semantic segmentation from fine-resolution remotely sensed images is an ...

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation

Deep learning approaches have shown promising results in remote sensing ...

Long-Range Correlation Supervision for Land-Cover Classification from Remote Sensing Images

Long-range dependency modeling has been widely considered in modern deep...

Multi-stage Attention ResU-Net for Semantic Segmentation of Fine-Resolution Remote Sensing Images

The attention mechanism can refine the extracted feature maps and boost ...

Learning Content-enhanced Mask Transformer for Domain Generalized Urban-Scene Segmentation

Domain-generalized urban-scene semantic segmentation (USSS) aims to lear...

ASSET: Autoregressive Semantic Scene Editing with Transformers at High Resolutions

We present ASSET, a neural architecture for automatically modifying an i...

Please sign up or login with your details

Forgot password? Click here to reset