Recurrent Scene Parsing with Perspective Understanding in the Loop

05/20/2017
by   Shu Kong, et al.
0

Objects may appear at arbitrary scales in perspective images of a scene, posing a challenge for recognition systems that process images at a fixed resolution. We propose a depth-aware gating module that adaptively selects the pooling field size in a convolutional network architecture according to the object scale (inversely proportional to the depth) so that small details are preserved for distant objects while larger receptive fields are used for those nearby. The depth gating signal is provided by stereo disparity or estimated directly from monocular input. We integrate this depth-aware gating into a recurrent convolutional neural network to perform semantic segmentation. Our recurrent module iteratively refines the segmentation results, leveraging the depth and semantic predictions from the previous iterations. Through extensive experiments on four popular large-scale RGB-D datasets, we demonstrate this approach achieves competitive semantic segmentation performance with a model which is substantially more compact. We carry out extensive analysis of this architecture including variants that operate on monocular RGB but use depth as side-information during training, unsupervised gating as a generic attentional mechanism, and multi-resolution gating. We find that gated pooling for joint semantic segmentation and depth yields state-of-the-art results for quantitative monocular depth estimation.

READ FULL TEXT

page 3

page 4

page 6

page 8

page 13

research
10/06/2016

Exploiting Depth from Single Monocular Images for Object Detection and Semantic Segmentation

Augmenting RGB data with measured depth has been shown to improve the pe...
research
10/03/2019

3D Neighborhood Convolution: Learning Depth-Aware Features for RGB-D and RGB Semantic Segmentation

A key challenge for RGB-D segmentation is how to effectively incorporate...
research
10/04/2022

FreDSNet: Joint Monocular Depth and Semantic Segmentation with Fast Fourier Convolutions

In this work we present FreDSNet, a deep learning solution which obtains...
research
05/22/2017

DepthCut: Improved Depth Edge Estimation Using Multiple Unreliable Channels

In the context of scene understanding, a variety of methods exists to es...
research
02/17/2020

3D Gated Recurrent Fusion for Semantic Scene Completion

This paper tackles the problem of data fusion in the semantic scene comp...
research
03/15/2016

Combining the Best of Convolutional Layers and Recurrent Layers: A Hybrid Network for Semantic Segmentation

State-of-the-art results of semantic segmentation are established by Ful...
research
02/26/2017

Analyzing Modular CNN Architectures for Joint Depth Prediction and Semantic Segmentation

This paper addresses the task of designing a modular neural network arch...

Please sign up or login with your details

Forgot password? Click here to reset