Boundary-induced and scene-aggregated network for monocular depth prediction

by   Feng Xue, et al.

Monocular depth prediction is an important task in scene understanding. It aims to predict the dense depth of a single RGB image. With the development of deep learning, the performance of this task has made great improvements. However, two issues remain unresolved: (1) The deep feature encodes the wrong farthest region in a scene, which leads to a distorted 3D structure of the predicted depth; (2) The low-level features are insufficient utilized, which makes it even harder to estimate the depth near the edge with sudden depth change. To tackle these two issues, we propose the Boundary-induced and Scene-aggregated network (BS-Net). In this network, the Depth Correlation Encoder (DCE) is first designed to obtain the contextual correlations between the regions in an image, and perceive the farthest region by considering the correlations. Meanwhile, the Bottom-Up Boundary Fusion (BUBF) module is designed to extract accurate boundary that indicates depth change. Finally, the Stripe Refinement module (SRM) is designed to refine the dense depth induced by the boundary cue, which improves the boundary accuracy of the predicted depth. Several experimental results on the NYUD v2 dataset and the iBims-1 dataset illustrate the state-of-the-art performance of the proposed approach. And the SUN-RGBD dataset is employed to evaluate the generalization of our method. Code is available at


page 2

page 8

page 10

page 20

page 24

page 29

page 31

page 32


Monocular Depth Estimation with Sharp Boundary

Monocular depth estimation is the base task in computer vision. It has a...

Structure-Aware Residual Pyramid Network for Monocular Depth Estimation

Monocular depth estimation is an essential task for scene understanding....

Monocular Depth Distribution Alignment with Low Computation

The performance of monocular depth estimation generally depends on the a...

Learning to Recover 3D Scene Shape from a Single Image

Despite significant progress in monocular depth estimation in the wild, ...

Attention Attention Everywhere: Monocular Depth Prediction with Skip Attention

Monocular Depth Estimation (MDE) aims to predict pixel-wise depth given ...

Efficient Semantic Scene Completion Network with Spatial Group Convolution

We introduce Spatial Group Convolution (SGC) for accelerating the comput...

Please sign up or login with your details

Forgot password? Click here to reset