Emotional Speaker Identification using a Novel Capsule Nets Model

01/09/2022
by   Ali Bou Nassif, et al.
0

Speaker recognition systems are widely used in various applications to identify a person by their voice; however, the high degree of variability in speech signals makes this a challenging task. Dealing with emotional variations is very difficult because emotions alter the voice characteristics of a person; thus, the acoustic features differ from those used to train models in a neutral environment. Therefore, speaker recognition models trained on neutral speech fail to correctly identify speakers under emotional stress. Although considerable advancements in speaker identification have been made using convolutional neural networks (CNN), CNNs cannot exploit the spatial association between low-level features. Inspired by the recent introduction of capsule networks (CapsNets), which are based on deep learning to overcome the inadequacy of CNNs in preserving the pose relationship between low-level features with their pooling technique, this study investigates the performance of using CapsNets in identifying speakers from emotional speech recordings. A CapsNet-based speaker identification model is proposed and evaluated using three distinct speech databases, i.e., the Emirati Speech Database, SUSAS Dataset, and RAVDESS (open-access). The proposed model is also compared to baseline systems. Experimental results demonstrate that the novel proposed CapsNet model trains faster and provides better results over current state-of-the-art schemes. The effect of the routing algorithm on speaker identification performance was also studied by varying the number of iterations, both with and without a decoder network.

READ FULL TEXT

page 18

page 23

research
10/08/2020

Emotion Invariant Speaker Embeddings for Speaker Identification with Emotional Speech

Emotional state of a speaker is found to have significant effect in spee...
research
03/29/2019

Does the Lombard Effect Improve Emotional Communication in Noise? - Analysis of Emotional Speech Acted in Noise -

Speakers usually adjust their way of talking in noisy environments invol...
research
10/23/2022

Speaker Identification from emotional and noisy speech data using learned voice segregation and Speech VGG

Speech signals are subjected to more acoustic interference and emotional...
research
02/11/2021

CASA-Based Speaker Identification Using Cascaded GMM-CNN Classifier in Noisy and Emotional Talking Conditions

This work aims at intensifying text-independent speaker identification p...
research
04/15/2020

Speaker Recognition in Bengali Language from Nonlinear Features

At present Automatic Speaker Recognition system is a very important issu...
research
10/18/2022

Risk of re-identification for shared clinical speech recordings

Large, curated datasets are required to leverage speech-based tools in h...
research
11/11/2021

Towards an Efficient Voice Identification Using Wav2Vec2.0 and HuBERT Based on the Quran Reciters Dataset

Current authentication and trusted systems depend on classical and biome...

Please sign up or login with your details

Forgot password? Click here to reset