Multimodal and Multi-view Models for Emotion Recognition

06/24/2019
by   Gustavo Aguilar, et al.
0

Studies on emotion recognition (ER) show that combining lexical and acoustic information results in more robust and accurate models. The majority of the studies focus on settings where both modalities are available in training and evaluation. However, in practice, this is not always the case; getting ASR output may represent a bottleneck in a deployment pipeline due to computational complexity or privacy-related constraints. To address this challenge, we study the problem of efficiently combining acoustic and lexical modalities during training while still providing a deployable acoustic model that does not require lexical inputs. We first experiment with multimodal models and two attention mechanisms to assess the extent of the benefits that lexical information can provide. Then, we frame the task as a multi-view learning problem to induce semantic information from a multimodal model into our acoustic-only network using a contrastive loss function. Our multimodal model outperforms the previous state of the art on the USC-IEMOCAP dataset reported on lexical and acoustic information. Additionally, our multi-view-trained acoustic network significantly surpasses models that have been exclusively trained with acoustic features.

READ FULL TEXT
research
09/10/2019

Multimodal Embeddings from Language Models

Word embeddings such as ELMo have recently been shown to model word sema...
research
02/08/2022

CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations

Deriving multimodal representations of audio and lexical inputs is a cen...
research
08/23/2019

Controlling for Confounders in Multimodal Emotion Classification via Adversarial Learning

Various psychological factors affect how individuals express emotions. Y...
research
06/29/2021

Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs

We present two multimodal fusion-based deep learning models that consume...
research
11/11/2020

Recognizing More Emotions with Less Data Using Self-supervised Transfer Learning

We propose a novel transfer learning method for speech emotion recogniti...
research
05/23/2023

Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding

Attention-based encoder-decoder (AED) models have shown impressive perfo...

Please sign up or login with your details

Forgot password? Click here to reset