DSARSR: Deep Stacked Auto-encoders Enhanced Robust Speaker Recognition

07/06/2023
by   Zhifeng Wang, et al.
0

Speaker recognition is a biometric modality that utilizes the speaker's speech segments to recognize the identity, determining whether the test speaker belongs to one of the enrolled speakers. In order to improve the robustness of the i-vector framework on cross-channel conditions and explore the nova method for applying deep learning to speaker recognition, the Stacked Auto-encoders are used to get the abstract extraction of the i-vector instead of applying PLDA. After pre-processing and feature extraction, the speaker and channel-independent speeches are employed for UBM training. The UBM is then used to extract the i-vector of the enrollment and test speech. Unlike the traditional i-vector framework, which uses linear discriminant analysis (LDA) to reduce dimension and increase the discrimination between speaker subspaces, this research use stacked auto-encoders to reconstruct the i-vector with lower dimension and different classifiers can be chosen to achieve final classification. The experimental results show that the proposed method achieves better performance than the state-of-the-art method.

READ FULL TEXT
research
02/25/2019

Channel adversarial training for cross-channel text-independent speaker recognition

The conventional speaker recognition frameworks (e.g., the i-vector and ...
research
06/01/2021

Supervised Speech Representation Learning for Parkinson's Disease Classification

Recently proposed automatic pathological speech classification technique...
research
04/11/2015

Gradual Training Method for Denoising Auto Encoders

Stacked denoising auto encoders (DAEs) are well known to learn useful de...
research
02/14/2020

Speaker Diarization with Region Proposal Network

Speaker diarization is an important pre-processing step for many speech ...
research
07/31/2019

Quantifying Cochlear Implant Users' Ability for Speaker Identification using CI Auditory Stimuli

Speaker recognition is a biometric modality that uses underlying speech ...
research
10/28/2022

Universal speaker recognition encoders for different speech segments duration

Creating universal speaker encoders which are robust for different acous...
research
04/25/2022

Back-ends Selection for Deep Speaker Embeddings

Probabilistic Linear Discriminant Analysis (PLDA) was the dominant and n...

Please sign up or login with your details

Forgot password? Click here to reset