Multi-Modal Emotion Detection with Transfer Learning

11/13/2020
by   Amith Ananthram, et al.
0

Automated emotion detection in speech is a challenging task due to the complex interdependence between words and the manner in which they are spoken. It is made more difficult by the available datasets; their small size and incompatible labeling idiosyncrasies make it hard to build generalizable emotion detection systems. To address these two challenges, we present a multi-modal approach that first transfers learning from related tasks in speech and text to produce robust neural embeddings and then uses these embeddings to train a pLDA classifier that is able to adapt to previously unseen emotions and domains. We begin by training a multilayer TDNN on the task of speaker identification with the VoxCeleb corpora and then fine-tune it on the task of emotion identification with the Crema-D corpus. Using this network, we extract speech embeddings for Crema-D from each of its layers, generate and concatenate text embeddings for the accompanying transcripts using a fine-tuned BERT model and then train an LDA - pLDA classifier on the resulting dense representations. We exhaustively evaluate the predictive power of every component: the TDNN alone, speech embeddings from each of its layers alone, text embeddings alone and every combination thereof. Our best variant, trained on only VoxCeleb and Crema-D and evaluated on IEMOCAP, achieves an EER of 38.05 portion of IEMOCAP during training produces a 5-fold averaged EER of 25.72 (For comparison, 44.71 annotator who disagrees).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/20/2020

Self-Supervised learning with cross-modal transformers for emotion recognition

Emotion recognition is a challenging task due to limited availability of...
research
04/14/2023

HCAM – Hierarchical Cross Attention Model for Multi-modal Emotion Recognition

Emotion recognition in conversations is challenging due to the multi-mod...
research
10/08/2020

Emotion Invariant Speaker Embeddings for Speaker Identification with Emotional Speech

Emotional state of a speaker is found to have significant effect in spee...
research
09/10/2020

Multi-modal embeddings using multi-task learning for emotion recognition

General embeddings like word2vec, GloVe and ELMo have shown a lot of suc...
research
11/15/2022

Is Style All You Need? Dependencies Between Emotion and GST-based Speaker Recognition

In this work, we study the hypothesis that speaker identity embeddings e...
research
12/13/2021

Detecting Emotion Carriers by Combining Acoustic and Lexical Representations

Personal narratives (PN) - spoken or written - are recollections of fact...
research
11/05/2020

NUAA-QMUL at SemEval-2020 Task 8: Utilizing BERT and DenseNet for Internet Meme Emotion Analysis

This paper describes our contribution to SemEval 2020 Task 8: Memotion A...

Please sign up or login with your details

Forgot password? Click here to reset