When CNNs Meet Random RNNs: Towards Multi-Level Analysis for RGB-D Object and Scene Recognition

04/26/2020
by   Ali Caglayan, et al.
17

Recognizing objects and scenes are two challenging but essential tasks in image understanding. In particular, the use of RGB-D sensors in handling these tasks has emerged as an important area of focus for better visual understanding. Meanwhile, deep neural networks, specifically convolutional neural networks (CNNs), have become widespread and have been applied to many visual tasks by replacing hand-crafted features with effective deep features. However, it is an open problem how to exploit deep features from a multi-layer CNN model effectively. In this paper, we propose a novel two-stage framework that extracts discriminative feature representations from multi-modal RGB-D images for object and scene recognition tasks. In the first stage, a pretrained CNN model has been employed as a backbone to extract visual features at multiple levels. The second stage maps these features into high level representations with a fully randomized structure of recursive neural networks (RNNs) efficiently. In order to cope with the high dimensionality of CNN activations, a random weighted pooling scheme has been proposed by extending the idea of randomness in RNNs. Multi-modal fusion has been performed through a soft voting approach by computing weights based on individual recognition confidences (i.e. SVM scores) of RGB and depth streams separately. This produces consistent class label estimation in final RGB-D classification performance. Extensive experiments verify that fully randomized structure in RNN stage encodes CNN activations to discriminative solid features successfully. Comparative experimental results on the popular Washington RGB-D Object and SUN RGB-D Scene datasets show that the proposed approach significantly outperforms state-of-the-art methods both in object and scene recognition tasks.

READ FULL TEXT

page 2

page 5

page 7

page 9

page 12

page 13

research
04/06/2016

Correlated and Individual Multi-Modal Deep Learning for RGB-D Object Recognition

In this paper, we propose a new correlated and individual multi-modal de...
research
10/03/2018

Image and Encoded Text Fusion for Multi-Modal Classification

Multi-modal approaches employ data from multiple input streams such as t...
research
06/05/2018

Recurrent Convolutional Fusion for RGB-D Object Recognition

Providing machines with the ability to recognize objects like humans has...
research
09/23/2020

2D-3D Geometric Fusion Network using Multi-Neighbourhood Graph Convolution for RGB-D Indoor Scene Classification

Multi-modal fusion has been proved to help enhance the performance of sc...
research
07/12/2020

Fruit classification using deep feature maps in the presence of deceptive similar classes

Autonomous detection and classification of objects are admired area of r...
research
12/04/2014

Fisher Kernel for Deep Neural Activations

Compared to image representation based on low-level local descriptors, d...
research
05/05/2019

A Joint Convolutional Neural Networks and Context Transfer for Street Scenes Labeling

Street scene understanding is an essential task for autonomous driving. ...

Please sign up or login with your details

Forgot password? Click here to reset