Multimodal Deep Learning for Robust RGB-D Object Recognition

07/24/2015
by   Andreas Eitel, et al.
0

Robust object recognition is a crucial ingredient of many, if not all, real-world robotics applications. This paper leverages recent progress on Convolutional Neural Networks (CNNs) and proposes a novel RGB-D architecture for object recognition. Our architecture is composed of two separate CNN processing streams - one for each modality - which are consecutively combined with a late fusion network. We focus on learning with imperfect sensor data, a typical problem in real-world robotics tasks. For accurate learning, we introduce a multi-stage training methodology and two crucial ingredients for handling depth data with CNNs. The first, an effective encoding of depth information for CNNs that enables learning without the need for large depth datasets. The second, a data augmentation scheme for robust learning with depth images by corrupting them with realistic noise patterns. We present state-of-the-art results on the RGB-D object dataset and show recognition in challenging RGB-D real-world noisy settings.

READ FULL TEXT

page 1

page 3

page 4

page 6

page 7

research
04/06/2016

Correlated and Individual Multi-Modal Deep Learning for RGB-D Object Recognition

In this paper, we propose a new correlated and individual multi-modal de...
research
06/05/2018

Recurrent Convolutional Fusion for RGB-D Object Recognition

Providing machines with the ability to recognize objects like humans has...
research
07/17/2022

Detecting Humans in RGB-D Data with CNNs

We address the problem of people detection in RGB-D data where we levera...
research
09/14/2020

A Multisensory Learning Architecture for Rotation-invariant Object Recognition

This study presents a multisensory machine learning architecture for obj...
research
10/11/2016

Multiple Instance Learning Convolutional Neural Networks for Object Recognition

Convolutional Neural Networks (CNN) have demon- strated its successful a...
research
10/03/2022

A Strong Transfer Baseline for RGB-D Fusion in Vision Transformers

The Vision Transformer (ViT) architecture has recently established its p...
research
01/21/2018

Depth CNNs for RGB-D scene recognition: learning from scratch better than transferring from RGB-CNNs

Scene recognition with RGB images has been extensively studied and has r...

Please sign up or login with your details

Forgot password? Click here to reset