Polarized Self-Attention: Towards High-quality Pixel-wise Regression

07/02/2021
by   Huajun Liu, et al.
0

Pixel-wise regression is probably the most common problem in fine-grained computer vision tasks, such as estimating keypoint heatmaps and segmentation masks. These regression problems are very challenging particularly because they require, at low computation overheads, modeling long-range dependencies on high-resolution inputs/outputs to estimate the highly nonlinear pixel-wise semantics. While attention mechanisms in Deep Convolutional Neural Networks(DCNNs) has become popular for boosting long-range dependencies, element-specific attention, such as Nonlocal blocks, is highly complex and noise-sensitive to learn, and most of simplified attention hybrids try to reach the best compromise among multiple types of tasks. In this paper, we present the Polarized Self-Attention(PSA) block that incorporates two critical designs towards high-quality pixel-wise regression: (1) Polarized filtering: keeping high internal resolution in both channel and spatial attention computation while completely collapsing input tensors along their counterpart dimensions. (2) Enhancement: composing non-linearity that directly fits the output distribution of typical fine-grained regression, such as the 2D Gaussian distribution (keypoint heatmaps), or the 2D Binormial distribution (binary segmentation masks). PSA appears to have exhausted the representation capacity within its channel-only and spatial-only branches, such that there is only marginal metric differences between its sequential and parallel layouts. Experimental results show that PSA boosts standard baselines by 2-4 points, and boosts state-of-the-arts by 1-2 points on 2D pose estimation and semantic segmentation benchmarks.

READ FULL TEXT

page 1

page 8

research
06/28/2020

Bottom-Up Human Pose Estimation by Ranking Heatmap-Guided Adaptive Keypoint Estimates

The typical bottom-up human pose estimation framework includes two stage...
research
06/08/2019

A Coarse-to-Fine Framework for Learned Color Enhancement with Non-Local Attention

Automatic color enhancement are aimed to automaticly and adaptively adju...
research
09/08/2019

Squeeze-and-Attention Networks for Semantic Segmentation

Squeeze-and-excitation (SE) module enhances the representational power o...
research
05/12/2021

A Large-Scale Benchmark for Food Image Segmentation

Food image segmentation is a critical and indispensible task for develop...
research
08/09/2021

PSGR: Pixel-wise Sparse Graph Reasoning for COVID-19 Pneumonia Segmentation in CT Images

Automated and accurate segmentation of the infected regions in computed ...
research
07/03/2023

Guided Patch-Grouping Wavelet Transformer with Spatial Congruence for Ultra-High Resolution Segmentation

Most existing ultra-high resolution (UHR) segmentation methods always st...

Please sign up or login with your details

Forgot password? Click here to reset