Self-attention on Multi-Shifted Windows for Scene Segmentation

07/10/2022
by   Litao Yu, et al.
7

Scene segmentation in images is a fundamental yet challenging problem in visual content understanding, which is to learn a model to assign every image pixel to a categorical label. One of the challenges for this learning task is to consider the spatial and semantic relationships to obtain descriptive feature representations, so learning the feature maps from multiple scales is a common practice in scene segmentation. In this paper, we explore the effective use of self-attention within multi-scale image windows to learn descriptive visual features, then propose three different strategies to aggregate these feature maps to decode the feature representation for dense prediction. Our design is based on the recently proposed Swin Transformer models, which totally discards convolution operations. With the simple yet effective multi-scale feature learning and aggregation, our models achieve very promising performance on four public scene segmentation datasets, PASCAL VOC2012, COCO-Stuff 10K, ADE20K and Cityscapes.

READ FULL TEXT

page 8

page 11

page 12

research
11/04/2020

Multi-layer Feature Aggregation for Deep Scene Parsing Models

Scene parsing from images is a fundamental yet challenging problem in vi...
research
08/03/2018

Interaction-aware Spatio-temporal Pyramid Attention Networks for Action Classification

Local features at neighboring spatial positions in feature maps have hig...
research
09/26/2022

Baking in the Feature: Accelerating Volumetric Segmentation by Rendering Feature Maps

Methods have recently been proposed that densely segment 3D volumes into...
research
06/29/2021

An Efficient Cervical Whole Slide Image Analysis Framework Based on Multi-scale Semantic and Spatial Deep Features

Digital gigapixel whole slide image (WSI) is widely used in clinical dia...
research
08/26/2021

Glimpse-Attend-and-Explore: Self-Attention for Active Visual Exploration

Active visual exploration aims to assist an agent with a limited field o...
research
04/18/2021

Multi-scale Self-calibrated Network for Image Light Source Transfer

Image light source transfer (LLST), as the most challenging task in the ...
research
01/19/2021

CAA : Channelized Axial Attention for Semantic Segmentation

Self-attention and channel attention, modelling the semantic interdepend...

Please sign up or login with your details

Forgot password? Click here to reset