Deep Multimodal Learning for Emotion Recognition in Spoken Language

02/22/2018
by   Yue Gu, et al.
0

In this paper, we present a novel deep multimodal framework to predict human emotions based on sentence-level spoken language. Our architecture has two distinctive characteristics. First, it extracts the high-level features from both text and audio via a hybrid deep multimodal structure, which considers the spatial information from text, temporal information from audio, and high-level associations from low-level handcrafted features. Second, we fuse all features by using a three-layer deep neural network to learn the correlations across modalities and train the feature extraction and fusion modules together, allowing optimal global fine-tuning of the entire structure. We evaluated the proposed framework on the IEMOCAP dataset. Our result shows promising performance, achieving 60.4

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/06/2019

Multimodal Fusion with Deep Neural Networks for Audio-Video Emotion Recognition

This paper presents a novel deep neural network (DNN) for multimodal fus...
research
11/13/2019

Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis

Multimodal language analysis often considers relationships between featu...
research
07/03/2018

Getting the subtext without the text: Scalable multimodal sentiment classification from visual and acoustic modalities

In the last decade, video blogs (vlogs) have become an extremely popular...
research
04/11/2023

Audio Bank: A High-Level Acoustic Signal Representation for Audio Event Recognition

Automatic audio event recognition plays a pivotal role in making human r...
research
06/16/2021

Silent Speech and Emotion Recognition from Vocal Tract Shape Dynamics in Real-Time MRI

Speech sounds of spoken language are obtained by varying configuration o...
research
08/24/2022

Hybrid Fusion Based Interpretable Multimodal Emotion Recognition with Insufficient Labelled Data

This paper proposes a multimodal emotion recognition system, VIsual Spok...
research
08/11/2021

Abstractive Sentence Summarization with Guidance of Selective Multimodal Reference

Multimodal abstractive summarization with sentence output is to generate...

Please sign up or login with your details

Forgot password? Click here to reset