RGB-D Grasp Detection via Depth Guided Learning with Cross-modal Attention

02/28/2023
by   Ran Qin, et al.
0

Planar grasp detection is one of the most fundamental tasks to robotic manipulation, and the recent progress of consumer-grade RGB-D sensors enables delivering more comprehensive features from both the texture and shape modalities. However, depth maps are generally of a relatively lower quality with much stronger noise compared to RGB images, making it challenging to acquire grasp depth and fuse multi-modal clues. To address the two issues, this paper proposes a novel learning based approach to RGB-D grasp detection, namely Depth Guided Cross-modal Attention Network (DGCAN). To better leverage the geometry information recorded in the depth channel, a complete 6-dimensional rectangle representation is adopted with the grasp depth dedicatedly considered in addition to those defined in the common 5-dimensional one. The prediction of the extra grasp depth substantially strengthens feature learning, thereby leading to more accurate results. Moreover, to reduce the negative impact caused by the discrepancy of data quality in two modalities, a Local Cross-modal Attention (LCA) module is designed, where the depth features are refined according to cross-modal relations and concatenated to the RGB ones for more sufficient fusion. Extensive simulation and physical evaluations are conducted and the experimental results highlight the superiority of the proposed approach.

READ FULL TEXT

page 1

page 3

page 5

research
03/19/2020

Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection

There are two main issues in RGB-D salient object detection: (1) how to ...
research
08/02/2022

Robust RGB-D Fusion for Saliency Detection

Efficiently exploiting multi-modal inputs for accurate RGB-D saliency de...
research
04/14/2021

Discrete Cosine Transform Network for Guided Depth Map Super-Resolution

Guided depth super-resolution (GDSR) is a hot topic in multi-modal image...
research
10/19/2022

Spatio-channel Attention Blocks for Cross-modal Crowd Counting

Crowd counting research has made significant advancements in real-world ...
research
04/12/2022

Towards Reliable Image Outpainting: Learning Structure-Aware Multimodal Fusion with Depth Guidance

Image outpainting technology generates visually reasonable content regar...
research
11/22/2016

Quad-networks: unsupervised learning to rank for interest point detection

Several machine learning tasks require to represent the data using only ...
research
02/24/2019

Vision Based Picking System for Automatic Express Package Dispatching

This paper presents a vision based robotic system to handle the picking ...

Please sign up or login with your details

Forgot password? Click here to reset