Speaker Identification from emotional and noisy speech data using learned voice segregation and Speech VGG

10/23/2022
by   Shibani Hamsa, et al.
0

Speech signals are subjected to more acoustic interference and emotional factors than other signals. Noisy emotion-riddled speech data is a challenge for real-time speech processing applications. It is essential to find an effective way to segregate the dominant signal from other external influences. An ideal system should have the capacity to accurately recognize required auditory events from a complex scene taken in an unfavorable situation. This paper proposes a novel approach to speaker identification in unfavorable conditions such as emotion and interference using a pre-trained Deep Neural Network mask and speech VGG. The proposed model obtained superior performance over the recent literature in English and Arabic emotional speech data and reported an average speaker identification rate of 85.2%, 87.0%, and 86.6% using the Ryerson audio-visual dataset (RAVDESS), speech under simulated and actual stress (SUSAS) dataset and Emirati-accented Speech dataset (ESD) respectively.

READ FULL TEXT

page 8

page 9

research
02/11/2021

CASA-Based Speaker Identification Using Cascaded GMM-CNN Classifier in Noisy and Emotional Talking Conditions

This work aims at intensifying text-independent speaker identification p...
research
09/03/2018

Three-Stage Speaker Verification Architecture in Emotional Talking Environments

Speaker verification performance in neutral talking environment is usual...
research
03/29/2019

Does the Lombard Effect Improve Emotional Communication in Noise? - Analysis of Emotional Speech Acted in Noise -

Speakers usually adjust their way of talking in noisy environments invol...
research
01/09/2022

Emotional Speaker Identification using a Novel Capsule Nets Model

Speaker recognition systems are widely used in various applications to i...
research
11/11/2021

Towards an Efficient Voice Identification Using Wav2Vec2.0 and HuBERT Based on the Quran Reciters Dataset

Current authentication and trusted systems depend on classical and biome...
research
02/27/2018

Deep factorization for speech signal

Various informative factors mixed in speech signals, leading to great di...
research
08/25/2020

ANGUS: Real-time manipulation of vocal roughness for emotional speech transformations

Vocal arousal, the non-linear acoustic features taken on by human and an...

Please sign up or login with your details

Forgot password? Click here to reset