S&CNet: A Enhanced Coarse-to-fine Framework For Monocular Depth Completion
Real-time depth completing is a critical problem for robotics and autonomous driving tasks. In this paper, we propose a light-weight coarse-to-fine network to complete a dense depth map from a single view RGB image and its related sparse depth map. Both the coarse estimation network and refinement network are in encoder-decoder form. To boost the performance of the coarse estimation network, we propose a novel spatial-and-channel (S&C) enhancer to boost the representation power of encoder network. The motivation for spatial-wise attention is from our finding that a lower output stride of encoder network preserve more detail but limit the receptive field. Thus, we employee spatial-wise attention to capture long-range contextual information. Besides, we found each channel in the features generated by the encoder network response to different distance. This discovery drives us to adopt channel-wise attention mechanism to reassign the weights of different channels as the decoder network should pay more attention to the channels response to distance contain rich objects, intuitively. To further improve the performance of our network, we adopt a refinement network which take the coarse estimation and sparse depth map as input. We evaluate our approach on KITTI benchmark, and the results show that our approach achieves competitive performance on RMSE metric with the state-of-the-art over published works but outperform it in all other metrics (iRMSE,MAE and iMAE) significantly with almost 3:5 times higher running speed. Crucially, our proposed S&C enhancer can be plugged into other existing networks and boost their performance significantly with a minimal additional computational cost.
READ FULL TEXT 
  
  
     share
 share