Improved Vocal Effort Transfer Vector Estimation for Vocal Effort-Robust Speaker Verification

05/03/2023
by   Iván López-Espejo, et al.
0

Despite the maturity of modern speaker verification technology, its performance still significantly degrades when facing non-neutrally-phonated (e.g., shouted and whispered) speech. To address this issue, in this paper, we propose a new speaker embedding compensation method based on a minimum mean square error (MMSE) estimator. This method models the joint distribution of the vocal effort transfer vector and non-neutrally-phonated embedding spaces and operates in a principal component analysis domain to cope with non-neutrally-phonated speech data scarcity. Experiments are carried out using a cutting-edge speaker verification system integrating a powerful self-supervised pre-trained model for speech representation. In comparison with a state-of-the-art embedding compensation method, the proposed MMSE estimator yields superior and competitive equal error rate results when tackling shouted and whispered speech, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2021

Target Speaker Verification with Selective Auditory Attention for Single and Multi-talker Speech

Speaker verification has been studied mostly under the single-talker con...
research
03/16/2022

Raw waveform speaker verification for supervised and self-supervised learning

Speaker verification models that directly operate upon raw waveforms are...
research
07/21/2020

Optimization of data-driven filterbank for automatic speaker verification

Most of the speech processing applications use triangular filters spaced...
research
12/11/2020

Exploring wav2vec 2.0 on speaker verification and language identification

Wav2vec 2.0 is a recently proposed self-supervised framework for speech ...
research
08/06/2020

Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions

The performance of speaker verification systems degrades when vocal effo...
research
09/26/2019

Self-Adaptive Soft Voice Activity Detection using Deep Neural Networks for Robust Speaker Verification

Voice activity detection (VAD), which classifies frames as speech or non...
research
05/21/2023

Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces

Self-supervised speech representations are known to encode both speaker ...

Please sign up or login with your details

Forgot password? Click here to reset