Multimodal Language Analysis with Recurrent Multistage Fusion

08/12/2018
by   Paul Pu Liang, et al.
0

Computational modeling of human multimodal language is an emerging research area in natural language processing spanning the language, visual and acoustic modalities. Comprehending multimodal language requires modeling not only the interactions within each modality (intra-modal interactions) but more importantly the interactions between modalities (cross-modal interactions). In this paper, we propose the Recurrent Multistage Fusion Network (RMFN) which decomposes the fusion problem into multiple stages, each of them focused on a subset of multimodal signals for specialized, effective fusion. Cross-modal interactions are modeled using this multistage fusion approach which builds upon intermediate representations of previous stages. Temporal and intra-modal interactions are modeled by integrating our proposed fusion approach with a system of recurrent neural networks. The RMFN displays state-of-the-art performance in modeling human multimodal language across three public datasets relating to multimodal sentiment analysis, emotion recognition, and speaker traits recognition. We provide visualizations to show that each stage of fusion focuses on a different subset of multimodal signals, learning increasingly discriminative multimodal representations.

READ FULL TEXT
research
08/15/2019

M-BERT: Injecting Multimodal Information in the BERT Structure

Multimodal language analysis is an emerging research area in natural lan...
research
07/17/2021

M2Lens: Visualizing and Explaining Multimodal Models for Sentiment Analysis

Multimodal sentiment analysis aims to recognize people's attitudes from ...
research
01/24/2022

MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis

Current deep learning approaches for multimodal fusion rely on bottom-up...
research
10/22/2020

MTGAT: Multimodal Temporal Graph Attention Networks for Unaligned Human Multimodal Language Sequences

Human communication is multimodal in nature; it is through multiple moda...
research
06/30/2022

MultiViz: An Analysis Benchmark for Visualizing and Understanding Multimodal Models

The promise of multimodal models for real-world applications has inspire...
research
06/09/2022

AttX: Attentive Cross-Connections for Fusion of Wearable Signals in Emotion Recognition

We propose cross-modal attentive connections, a new dynamic and effectiv...
research
11/27/2020

Analyzing Unaligned Multimodal Sequence via Graph Convolution and Graph Pooling Fusion

In this paper, we study the task of multimodal sequence analysis which a...

Please sign up or login with your details

Forgot password? Click here to reset