Gated Multimodal Units for Information Fusion

02/07/2017
by   John Arevalo, et al.
0

This paper presents a novel model for multimodal learning based on gated neural networks. The Gated Multimodal Unit (GMU) model is intended to be used as an internal unit in a neural network architecture whose purpose is to find an intermediate representation based on a combination of data from different modalities. The GMU learns to decide how modalities influence the activation of the unit using multiplicative gates. It was evaluated on a multilabel scenario for genre classification of movies using the plot and the poster. The GMU improved the macro f-score performance of single-modality approaches and outperformed other fusion strategies, including mixture of experts models. Along with this work, the MM-IMDb dataset is released which, to the best of our knowledge, is the largest publicly available multimodal dataset for genre prediction on movies.

READ FULL TEXT
research
07/06/2019

Multimodal Fusion with Deep Neural Networks for Audio-Video Emotion Recognition

This paper presents a novel deep neural network (DNN) for multimodal fus...
research
01/26/2020

Multimodal Data Fusion based on the Global Workspace Theory

We propose a novel neural network architecture, named the Global Workspa...
research
02/03/2018

Multimodal Sentiment Analysis with Word-Level Fusion and Reinforcement Learning

With the increasing popularity of video sharing websites such as YouTube...
research
10/27/2021

Detecting Dementia from Speech and Transcripts using Transformers

Alzheimer's disease (AD) constitutes a neurodegenerative disease with se...
research
04/11/2022

Mixture-of-experts VAEs can disregard variation in surjective multimodal data

Machine learning systems are often deployed in domains that entail data ...
research
07/01/2021

Deep Orthogonal Fusion: Multimodal Prognostic Biomarker Discovery Integrating Radiology, Pathology, Genomic, and Clinical Data

Clinical decision-making in oncology involves multimodal data such as ra...
research
07/17/2023

Clarifying the Half Full or Half Empty Question: Multimodal Container Classification

Multimodal integration is a key component of allowing robots to perceive...

Please sign up or login with your details

Forgot password? Click here to reset