On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition

05/21/2023
by   Lokesh Bansal, et al.
0

New-age conversational agent systems perform both speech emotion recognition (SER) and automatic speech recognition (ASR) using two separate and often independent approaches for real-world application in noisy environments. In this paper, we investigate a joint ASR-SER multitask learning approach in a low-resource setting and show that improvements are observed not only in SER, but also in ASR. We also investigate the robustness of such jointly trained models to the presence of background noise, babble, and music. Experimental results on the IEMOCAP dataset show that joint learning can improve ASR word error rate (WER) and SER classification accuracy by 10.7 in clean scenarios. In noisy scenarios, results on data augmented with MUSAN show that the joint approach outperforms the independent ASR and SER approaches across many noisy conditions. Overall, the joint ASR-SER approach yielded more noise-resistant models than the independent ASR and SER approaches.

READ FULL TEXT

page 2

page 4

research
10/29/2021

Fusing ASR Outputs in Joint Training for Speech Emotion Recognition

Alongside acoustic information, linguistic features based on speech tran...
research
09/14/2016

An Adaptive Psychoacoustic Model for Automatic Speech Recognition

Compared with automatic speech recognition (ASR), the human auditory sys...
research
01/05/2019

Improving noise robustness of automatic speech recognition via parallel data and teacher-student learning

For real-world speech recognition applications, noise robustness is stil...
research
07/11/2022

pMCT: Patched Multi-Condition Training for Robust Speech Recognition

We propose a novel Patched Multi-Condition Training (pMCT) method for ro...
research
10/05/2021

Disambiguation-BERT for N-best Rescoring in Low-Resource Conversational ASR

We study the inclusion of past conversational context through BERT langu...
research
06/06/2023

Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses

Alzheimer's Disease (AD) is the world's leading neurodegenerative diseas...

Please sign up or login with your details

Forgot password? Click here to reset