Structure Optimization for Deep Multimodal Fusion Networks using Graph-Induced Kernels

07/03/2017
by   Dhanesh Ramachandram, et al.
0

A popular testbed for deep learning has been multimodal recognition of human activity or gesture involving diverse inputs such as video, audio, skeletal pose and depth images. Deep learning architectures have excelled on such problems due to their ability to combine modality representations at different levels of nonlinear feature extraction. However, designing an optimal architecture in which to fuse such learned representations has largely been a non-trivial human engineering effort. We treat fusion structure optimization as a hyper-parameter search and cast it as a discrete optimization problem under the Bayesian optimization framework. We propose a novel graph-induced kernel to compute structural similarities in the search space of tree-structured multimodal architectures and demonstrate its effectiveness using two challenging multimodal human activity recognition datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2020

EmbraceNet for Activity: A Deep Multimodal Fusion Architecture for Activity Recognition

Human activity recognition using multiple sensors is a challenging but p...
research
03/15/2019

MFAS: Multimodal Fusion Architecture Search

We tackle the problem of finding good architectures for multimodal class...
research
02/03/2021

MUFASA: Multimodal Fusion Architecture Search for Electronic Health Records

One important challenge of applying deep learning to electronic health r...
research
01/24/2022

MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis

Current deep learning approaches for multimodal fusion rely on bottom-up...
research
05/20/2023

Efficient Multimodal Neural Networks for Trigger-less Voice Assistants

The adoption of multimodal interactions by Voice Assistants (VAs) is gro...
research
05/30/2021

Rethinking the constraints of multimodal fusion: case study in Weakly-Supervised Audio-Visual Video Parsing

For multimodal tasks, a good feature extraction network should extract i...
research
04/26/2022

Multi stain graph fusion for multimodal integration in pathology

In pathology, tissue samples are assessed using multiple staining techni...

Please sign up or login with your details

Forgot password? Click here to reset