GSTO: Gated Scale-Transfer Operation for Multi-Scale Feature Learning in Pixel Labeling

05/27/2020
by   Zhuoying Wang, et al.
25

Existing CNN-based methods for pixel labeling heavily depend on multi-scale features to meet the requirements of both semantic comprehension and detail preservation. State-of-the-art pixel labeling neural networks widely exploit conventional scale-transfer operations, i.e., up-sampling and down-sampling to learn multi-scale features. In this work, we find that these operations lead to scale-confused features and suboptimal performance because they are spatial-invariant and directly transit all feature information cross scales without spatial selection. To address this issue, we propose the Gated Scale-Transfer Operation (GSTO) to properly transit spatial-filtered features to another scale. Specifically, GSTO can work either with or without extra supervision. Unsupervised GSTO is learned from the feature itself while the supervised one is guided by the supervised probability matrix. Both forms of GSTO are lightweight and plug-and-play, which can be flexibly integrated into networks or modules for learning better multi-scale features. In particular, by plugging GSTO into HRNet, we get a more powerful backbone (namely GSTO-HRNet) for pixel labeling, and it achieves new state-of-the-art results on the COCO benchmark for human pose estimation and other benchmarks for semantic segmentation including Cityscapes, LIP and Pascal Context, with negligible extra computational cost. Moreover, experiment results demonstrate that GSTO can also significantly boost the performance of multi-scale feature aggregation modules like PPM and ASPP. Code will be made available at https://github.com/VDIGPKU/GSTO.

READ FULL TEXT

page 2

page 4

page 9

research
04/22/2022

Dite-HRNet: Dynamic Lightweight High-Resolution Network for Human Pose Estimation

A high-resolution network exhibits remarkable capability in extracting m...
research
03/26/2022

Feature Selective Transformer for Semantic Image Segmentation

Recently, it has attracted more and more attentions to fuse multi-scale ...
research
06/01/2021

Full-Resolution Encoder-Decoder Networks with Multi-Scale Feature Fusion for Human Pose Estimation

To achieve more accurate 2D human pose estimation, we extend the success...
research
06/03/2021

Multi-Scale Feature Aggregation by Cross-Scale Pixel-to-Region Relation Operation for Semantic Segmentation

Exploiting multi-scale features has shown great potential in tackling se...
research
11/01/2021

Dense Prediction with Attentive Feature Aggregation

Aggregating information from features across different layers is an esse...
research
07/11/2023

Compact Twice Fusion Network for Edge Detection

The significance of multi-scale features has been gradually recognized b...
research
06/29/2021

An Efficient Cervical Whole Slide Image Analysis Framework Based on Multi-scale Semantic and Spatial Deep Features

Digital gigapixel whole slide image (WSI) is widely used in clinical dia...

Please sign up or login with your details

Forgot password? Click here to reset