ORDNet: Capturing Omni-Range Dependencies for Scene Parsing

01/11/2021
by   Shaofei Huang, et al.
8

Learning to capture dependencies between spatial positions is essential to many visual tasks, especially the dense labeling problems like scene parsing. Existing methods can effectively capture long-range dependencies with self-attention mechanism while short ones by local convolution. However, there is still much gap between long-range and short-range dependencies, which largely reduces the models' flexibility in application to diverse spatial scales and relationships in complicated natural scene images. To fill such a gap, we develop a Middle-Range (MR) branch to capture middle-range dependencies by restricting self-attention into local patches. Also, we observe that the spatial regions which have large correlations with others can be emphasized to exploit long-range dependencies more accurately, and thus propose a Reweighed Long-Range (RLR) branch. Based on the proposed MR and RLR branches, we build an Omni-Range Dependencies Network (ORDNet) which can effectively capture short-, middle- and long-range dependencies. Our ORDNet is able to extract more comprehensive context information and well adapt to complex spatial variance in scene images. Extensive experiments show that our proposed ORDNet outperforms previous state-of-the-art methods on three scene parsing benchmarks including PASCAL Context, COCO Stuff and ADE20K, demonstrating the superiority of capturing omni-range dependencies in deep models for scene parsing task.

READ FULL TEXT

page 1

page 5

page 8

page 9

research
11/09/2018

Scene Parsing via Dense Recurrent Neural Networks with Attentional Selection

Recurrent neural networks (RNNs) have shown the ability to improve scene...
research
03/30/2020

Strip Pooling: Rethinking Spatial Pooling for Scene Parsing

Spatial pooling has been proven highly effective in capturing long-range...
research
07/09/2020

Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision

Violence detection has been studied in computer vision for years. Howeve...
research
12/05/2017

Deep Semantic Role Labeling with Self-Attention

Semantic Role Labeling (SRL) is believed to be a crucial step towards na...
research
07/06/2022

Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding

Conformer has proven to be effective in many speech processing tasks. It...
research
12/12/2022

BeautyREC: Robust, Efficient, and Content-preserving Makeup Transfer

In this work, we propose a Robust, Efficient, and Component-specific mak...
research
12/14/2021

Explore Long-Range Context feature for Speaker Verification

Capturing long-range dependency and modeling long temporal contexts is p...

Please sign up or login with your details

Forgot password? Click here to reset