A Change of Heart: Improving Speech Emotion Recognition through Speech-to-Text Modality Conversion

07/21/2023
by   Zeinab Sadat Taghavi, et al.
0

Speech Emotion Recognition (SER) is a challenging task. In this paper, we introduce a modality conversion concept aimed at enhancing emotion recognition performance on the MELD dataset. We assess our approach through two experiments: first, a method named Modality-Conversion that employs automatic speech recognition (ASR) systems, followed by a text classifier; second, we assume perfect ASR output and investigate the impact of modality conversion on SER, this method is called Modality-Conversion++. Our findings indicate that the first method yields substantial results, while the second method outperforms state-of-the-art (SOTA) speech-based approaches in terms of SER weighted-F1 (WF1) score on the MELD dataset. This research highlights the potential of modality conversion for tasks that can be conducted in alternative modalities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2023

ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition

In Speech Emotion Recognition (SER), textual data is often used alongsid...
research
05/23/2018

ASR-based Features for Emotion Recognition: A Transfer Learning Approach

During the last decade, the applications of signal processing have drast...
research
08/14/2023

Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations

Although automatic emotion recognition (AER) has recently drawn signific...
research
02/05/2022

LEAPMood: Light and Efficient Architecture to Predict Mood with Genetic Algorithm driven Hyperparameter Tuning

Accurate and automatic detection of mood serves as a building block for ...
research
02/28/2019

Incorporating End-to-End Speech Recognition Models for Sentiment Analysis

Previous work on emotion recognition demonstrated a synergistic effect o...
research
07/26/2022

Multimodal Speech Emotion Recognition using Cross Attention with Aligned Audio and Text

In this paper, we propose a novel speech emotion recognition model calle...

Please sign up or login with your details

Forgot password? Click here to reset