Getting the subtext without the text: Scalable multimodal sentiment classification from visual and acoustic modalities

07/03/2018
by   Nathaniel Blanchard, et al.
0

In the last decade, video blogs (vlogs) have become an extremely popular method through which people express sentiment. The ubiquitousness of these videos has increased the importance of multimodal fusion models, which incorporate video and audio features with traditional text features for automatic sentiment detection. Multimodal fusion offers a unique opportunity to build models that learn from the full depth of expression available to human viewers. In the detection of sentiment in these videos, acoustic and video features provide clarity to otherwise ambiguous transcripts. In this paper, we present a multimodal fusion model that exclusively uses high-level video and audio features to analyze spoken sentences for sentiment. We discard traditional transcription features in order to minimize human intervention and to maximize the deployability of our model on at-scale real-world data. We select high-level features for our model that have been successful in nonaffect domains in order to test their generalizability in the sentiment detection domain. We train and test our model on the newly released CMU Multimodal Opinion Sentiment and Emotion Intensity (CMUMOSEI) dataset, obtaining an F1 score of 0.8049 on the validation set and an F1 score of 0.6325 on the held-out challenge test set.

READ FULL TEXT
research
06/20/2016

MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos

People are sharing their opinions, stories and reviews through online vi...
research
04/17/2019

Sentiment Analysis using Deep Robust Complementary Fusion of Multi-Features and Multi-Modalities

Sentiment analysis research has been rapidly developing in the last deca...
research
01/26/2020

Multimodal Data Fusion based on the Global Workspace Theory

We propose a novel neural network architecture, named the Global Workspa...
research
02/22/2018

Deep Multimodal Learning for Emotion Recognition in Spoken Language

In this paper, we present a novel deep multimodal framework to predict h...
research
12/12/2018

A Multimodal LSTM for Predicting Listener Empathic Responses Over Time

People naturally understand the emotions of-and often also empathize wit...
research
05/28/2021

Highlight Timestamp Detection Model for Comedy Videos via Multimodal Sentiment Analysis

Nowadays, the videos on the Internet are prevailing. The precise and in-...
research
03/26/2021

DBATES: DataBase of Audio features, Text, and visual Expressions in competitive debate Speeches

In this work, we present a database of multimodal communication features...

Please sign up or login with your details

Forgot password? Click here to reset