Emotion Invariant Speaker Embeddings for Speaker Identification with Emotional Speech

10/08/2020
by   Biswajit Dev Sarma, et al.
0

Emotional state of a speaker is found to have significant effect in speech production, which can deviate speech from that arising from neutral state. This makes identifying speakers with different emotions a challenging task as generally the speaker models are trained using neutral speech. In this work, we propose to overcome this problem by creation of emotion invariant speaker embedding. We learn an extractor network that maps the test embeddings with different emotions obtained using i-vector based system to an emotion invariant space. The resultant test embeddings thus become emotion invariant and thereby compensate the mismatch between various emotional states. The studies are conducted using four different emotion classes from IEMOCAP database. We obtain an absolute improvement of 2.6 using emotion invariant speaker embedding against average speaker model based framework with different emotions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/07/2021

Multi-speaker Emotional Text-to-speech Synthesizer

We present a methodology to train our multi-speaker emotional text-to-sp...
research
01/09/2022

Emotional Speaker Identification using a Novel Capsule Nets Model

Speaker recognition systems are widely used in various applications to i...
research
08/15/2022

Analysis of impact of emotions on target speech extraction and speech separation

Recently, the performance of blind speech separation (BSS) and target sp...
research
06/14/2023

EmoStim: A Database of Emotional Film Clips with Discrete and Componential Assessment

Emotion elicitation using emotional film clips is one of the most common...
research
11/13/2020

Multi-Modal Emotion Detection with Transfer Learning

Automated emotion detection in speech is a challenging task due to the c...
research
09/14/2023

Analysis of Speech Separation Performance Degradation on Emotional Speech Mixtures

Despite recent strides made in Speech Separation, most models are traine...
research
10/08/2021

Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

In expressive speech synthesis, there are high requirements for emotion ...

Please sign up or login with your details

Forgot password? Click here to reset