Multimodal Recurrent Neural Networks with Information Transfer Layers for Indoor Scene Labeling

03/13/2018
by   Abrar H. Abdulnabi, et al.
0

This paper proposes a new method called Multimodal RNNs for RGB-D scene semantic segmentation. It is optimized to classify image pixels given two input sources: RGB color channels and Depth maps. It simultaneously performs training of two recurrent neural networks (RNNs) that are crossly connected through information transfer layers, which are learnt to adaptively extract relevant cross-modality features. Each RNN model learns its representations from its own previous hidden states and transferred patterns from the other RNNs previous hidden states; thus, both model-specific and crossmodality features are retained. We exploit the structure of quad-directional 2D-RNNs to model the short and long range contextual information in the 2D input image. We carefully designed various baselines to efficiently examine our proposed model structure. We test our Multimodal RNNs method on popular RGB-D benchmarks and show how it outperforms previous methods significantly and achieves competitive results with other state-of-the-art works.

READ FULL TEXT

page 1

page 3

page 5

page 10

page 11

page 12

research
09/02/2015

DAG-Recurrent Neural Networks For Scene Labeling

In image labeling, local representations for image units are usually gen...
research
03/09/2017

DA-RNN: Semantic Mapping with Data Associated Recurrent Neural Networks

3D scene understanding is important for robots to interact with the 3D w...
research
07/08/2016

Multi-level Contextual RNNs with Attention Model for Scene Labeling

Context in image is crucial for scene labeling while existing methods on...
research
09/17/2018

Learning Effective RGB-D Representations for Scene Recognition

Deep convolutional networks (CNN) can achieve impressive results on RGB ...
research
08/27/2016

Multi-Path Feedback Recurrent Neural Network for Scene Parsing

In this paper, we consider the scene parsing problem and propose a novel...
research
03/30/2022

Pay Attention to Hidden States for Video Deblurring: Ping-Pong Recurrent Neural Networks and Selective Non-Local Attention

Video deblurring models exploit information in the neighboring frames to...
research
07/05/2023

A Versatile Hub Model For Efficient Information Propagation And Feature Selection

Hub structure, characterized by a few highly interconnected nodes surrou...

Please sign up or login with your details

Forgot password? Click here to reset