Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations

08/14/2023
by   Wen Wu, et al.
0

Although automatic emotion recognition (AER) has recently drawn significant research interest, most current AER studies use manually segmented utterances, which are usually unavailable for dialogue systems. This paper proposes integrating AER with automatic speech recognition (ASR) and speaker diarisation (SD) in a jointly-trained system. Distinct output layers are built for four sub-tasks including AER, ASR, voice activity detection and speaker classification based on a shared encoder. Taking the audio of a conversation as input, the integrated system finds all speech segments and transcribes the corresponding emotion classes, word sequences, and speaker identities. Two metrics are proposed to evaluate AER performance with automatic segmentation based on time-weighted emotion and speaker classification errors. Results on the IEMOCAP dataset show that the proposed system consistently outperforms two baselines with separately trained single-task systems on AER, ASR and SD.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2023

ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition

In Speech Emotion Recognition (SER), textual data is often used alongsid...
research
10/27/2020

Emotion recognition by fusing time synchronous and time asynchronous representations

In this paper, a novel two-branch neural network model structure is prop...
research
11/23/2022

Whose Emotion Matters? Speaker Detection without Prior Knowledge

The task of emotion recognition in conversations (ERC) benefits from the...
research
06/09/2020

audino: A Modern Annotation Tool for Audio and Speech

In this paper, we introduce a collaborative and modern annotation tool f...
research
01/02/2020

Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends

Research on speech processing has traditionally considered the task of d...
research
04/08/2020

Semi-supervised acoustic modelling for five-lingual code-switched ASR using automatically-segmented soap opera speech

This paper considers the impact of automatic segmentation on the fully-a...
research
07/21/2023

A Change of Heart: Improving Speech Emotion Recognition through Speech-to-Text Modality Conversion

Speech Emotion Recognition (SER) is a challenging task. In this paper, w...

Please sign up or login with your details

Forgot password? Click here to reset