Learning Effective RGB-D Representations for Scene Recognition

09/17/2018
by   Xinhang Song, et al.
8

Deep convolutional networks (CNN) can achieve impressive results on RGB scene recognition thanks to large datasets such as Places. In contrast, RGB-D scene recognition is still underdeveloped in comparison, due to two limitations of RGB-D data we address in this paper. The first limitation is the lack of depth data for training deep learning models. Rather than fine tuning or transferring RGB-specific features, we address this limitation by proposing an architecture and a two-step training approach that directly learns effective depth-specific features using weak supervision via patches. The resulting RGB-D model also benefits from more complementary multimodal features. Another limitation is the short range of depth sensors (typically 0.5m to 5.5m), resulting in depth images not capturing distant objects in the scenes that RGB images can. We show that this limitation can be addressed by using RGB-D videos, where more comprehensive depth information is accumulated as the camera travels across the scene. Focusing on this scenario, we introduce the ISIA RGB-D video dataset to evaluate RGB-D scene recognition with videos. Our video recognition architecture combines convolutional and recurrent neural networks (RNNs) that are trained in three steps with increasingly complex data to learn effective features (i.e. patches, frames and sequences). Our approach obtains state-of-the-art performances on RGB-D image (NYUD2 and SUN RGB-D) and video (ISIA RGB-D) scene recognition.

READ FULL TEXT

page 2

page 4

page 5

page 6

page 8

research
01/21/2018

Depth CNNs for RGB-D scene recognition: learning from scratch better than transferring from RGB-CNNs

Scene recognition with RGB images has been extensively studied and has r...
research
06/09/2022

Segmentation Enhanced Lameness Detection in Dairy Cows from RGB and Depth Video

Cow lameness is a severe condition that affects the life cycle and life ...
research
12/22/2014

Occlusion Edge Detection in RGB-D Frames using Deep Convolutional Networks

Occlusion edges in images which correspond to range discontinuity in the...
research
03/31/2017

(DE)^2 CO: Deep Depth Colorization

Object recognition on depth images using convolutional neural networks r...
research
03/13/2018

Multimodal Recurrent Neural Networks with Information Transfer Layers for Indoor Scene Labeling

This paper proposes a new method called Multimodal RNNs for RGB-D scene ...
research
11/09/2016

Computationally Efficient Target Classification in Multispectral Image Data with Deep Neural Networks

Detecting and classifying targets in video streams from surveillance cam...
research
09/30/2018

Posture recognition using an RGB-D camera : exploring 3D body modeling and deep learning approaches

The emergence of RGB-D sensors offered new possibilities for addressing ...

Please sign up or login with your details

Forgot password? Click here to reset