ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition

05/25/2023
by   Yuanchao Li, et al.
0

In Speech Emotion Recognition (SER), textual data is often used alongside audio signals to address their inherent variability. However, the reliance on human annotated text in most research hinders the development of practical SER systems. To overcome this challenge, we investigate how Automatic Speech Recognition (ASR) performs on emotional speech by analyzing the ASR performance on emotion corpora and examining the distribution of word errors and confidence scores in ASR transcripts to gain insight into how emotion affects ASR. We utilize four ASR systems, namely Kaldi ASR, wav2vec2, Conformer, and Whisper, and three corpora: IEMOCAP, MOSI, and MELD to ensure generalizability. Additionally, we conduct text-based SER on ASR transcripts with increasing word error rates to investigate how ASR affects SER. The objective of this study is to uncover the relationship and mutual impact of ASR and SER, in order to facilitate ASR adaptation to emotional speech and the use of SER in real world.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2018

ASR-based Features for Emotion Recognition: A Transfer Learning Approach

During the last decade, the applications of signal processing have drast...
research
11/18/2022

A Persian ASR-based SER: Modification of Sharif Emotional Speech Database and Investigation of Persian Text Corpora

Speech Emotion Recognition (SER) is one of the essential perceptual meth...
research
03/14/2022

RED-ACE: Robust Error Detection for ASR using Confidence Embeddings

ASR Error Detection (AED) models aim to post-process the output of Autom...
research
08/14/2023

Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations

Although automatic emotion recognition (AER) has recently drawn signific...
research
07/21/2023

A Change of Heart: Improving Speech Emotion Recognition through Speech-to-Text Modality Conversion

Speech Emotion Recognition (SER) is a challenging task. In this paper, w...
research
03/27/2022

A Dataset for Speech Emotion Recognition in Greek Theatrical Plays

Machine learning methodologies can be adopted in cultural applications a...
research
10/27/2020

Emotion recognition by fusing time synchronous and time asynchronous representations

In this paper, a novel two-branch neural network model structure is prop...

Please sign up or login with your details

Forgot password? Click here to reset