GeThR-Net: A Generalized Temporally Hybrid Recurrent Neural Network for Multimodal Information Fusion

09/17/2016
by   Ankit Gandhi, et al.
0

Data generated from real world events are usually temporal and contain multimodal information such as audio, visual, depth, sensor etc. which are required to be intelligently combined for classification tasks. In this paper, we propose a novel generalized deep neural network architecture where temporal streams from multiple modalities are combined. There are total M+1 (M is the number of modalities) components in the proposed network. The first component is a novel temporally hybrid Recurrent Neural Network (RNN) that exploits the complimentary nature of the multimodal temporal information by allowing the network to learn both modality specific temporal dynamics as well as the dynamics in a multimodal feature space. M additional components are added to the network which extract discriminative but non-temporal cues from each modality. Finally, the predictions from all of these components are linearly combined using a set of automatically learned weights. We perform exhaustive experiments on three different datasets spanning four modalities. The proposed network is relatively 3.5 temporal multimodal baseline for UCF-101, CCV and Multimodal Gesture datasets respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2017

Deep Multimodal Representation Learning from Temporal Data

In recent years, Deep Learning has been successfully applied to multimod...
research
06/18/2017

3D Convolutional Neural Networks for Cross Audio-Visual Matching Recognition

Audio-visual recognition (AVR) has been considered as a solution for spe...
research
11/27/2020

Analyzing Unaligned Multimodal Sequence via Graph Convolution and Graph Pooling Fusion

In this paper, we study the task of multimodal sequence analysis which a...
research
10/19/2019

Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of Videos

We present an audio-visual multimodal approach for the task of zeroshot ...
research
04/16/2019

Multimodal Subspace Support Vector Data Description

In this paper, we propose a novel method for projecting data from multip...
research
11/22/2019

Factorized Multimodal Transformer for Multimodal Sequential Learning

The complex world around us is inherently multimodal and sequential (con...
research
10/20/2022

A Multimodal Sensor Fusion Framework Robust to Missing Modalities for Person Recognition

Utilizing the sensor characteristics of the audio, visible camera, and t...

Please sign up or login with your details

Forgot password? Click here to reset