Multimodal Relational Tensor Network for Sentiment and Emotion Classification

06/07/2018
by   Saurav Sahay, et al.
0

Understanding Affect from video segments has brought researchers from the language, audio and video domains together. Most of the current multimodal research in this area deals with various techniques to fuse the modalities, and mostly treat the segments of a video independently. Motivated by the work of (Zadeh et al., 2017) and (Poria et al., 2017), we present our architecture, Relational Tensor Network, where we use the inter-modal interactions within a segment (intra-segment) and also consider the sequence of segments in a video to model the inter-segment inter-modal interactions. We also generate rich representations of text and audio modalities by leveraging richer audio and linguistic context alongwith fusing fine-grained knowledge based polarity scores from text. We present the results of our model on CMU-MOSEI dataset and show that our model outperforms many baselines and state of the art methods for sentiment classification and emotion recognition.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/02/2018

Multimodal Utterance-level Affect Analysis using Visual, Audio and Text Features

Affective computing models are essential for human behavior analysis. A ...
research
07/31/2022

GraphMFT: A Graph Network based Multimodal Fusion Technique for Emotion Recognition in Conversation

Multimodal machine learning is an emerging area of research, which has r...
research
05/11/2022

Bias and Fairness on Multimodal Emotion Detection Algorithms

Numerous studies have shown that machine learning algorithms can latch o...
research
11/13/2019

Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis

Multimodal language analysis often considers relationships between featu...
research
05/31/2019

Multimodal Joint Emotion and Game Context Recognition in League of Legends Livestreams

Video game streaming provides the viewer with a rich set of audio-visual...
research
08/22/2022

Make Acoustic and Visual Cues Matter: CH-SIMS v2.0 Dataset and AV-Mixup Consistent Module

Multimodal sentiment analysis (MSA), which supposes to improve text-base...
research
11/08/2022

A Multimodal Approach for Dementia Detection from Spontaneous Speech with Tensor Fusion Layer

Alzheimer's disease (AD) is a progressive neurological disorder, meaning...

Please sign up or login with your details

Forgot password? Click here to reset