Within-sample variability-invariant loss for robust speaker recognition under noisy environments

02/03/2020
by   Danwei Cai, et al.
0

Despite the significant improvements in speaker recognition enabled by deep neural networks, unsatisfactory performance persists under noisy environments. In this paper, we train the speaker embedding network to learn the "clean" embedding of the noisy utterance. Specifically, the network is trained with the original speaker identification loss with an auxiliary within-sample variability-invariant loss. This auxiliary variability-invariant loss is used to learn the same embedding among the clean utterance and its noisy copies and prevents the network from encoding the undesired noises or variabilities into the speaker representation. Furthermore, we investigate the data preparation strategy for generating clean and noisy utterance pairs on-the-fly. The strategy generates different noisy copies for the same clean utterance at each training step, helping the speaker embedding network generalize better under noisy environments. Experiments on VoxCeleb1 indicate that the proposed training framework improves the performance of the speaker verification system in both clean and noisy conditions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2021

PL-EESR: Perceptual Loss Based END-TO-END Robust Speaker Representation Extraction

Speech enhancement aims to improve the perceptual quality of the speech ...
research
11/23/2018

Training Multi-Task Adversarial Network For Extracting Noise-Robust Speaker Embedding

Under noisy environments, to achieve the robust performance of speaker r...
research
07/23/2020

Augmentation adversarial training for unsupervised speaker recognition

The goal of this work is to train robust speaker recognition models with...
research
05/25/2021

Utterance partitioning for speaker recognition: an experimental review and analysis with new findings under GMM-SVM framework

The performance of speaker recognition system is highly dependent on the...
research
04/05/2022

Design Guidelines for Inclusive Speaker Verification Evaluation Datasets

Speaker verification (SV) provides billions of voice-enabled devices wit...
research
03/29/2022

DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning

Most text-to-speech (TTS) methods use high-quality speech corpora record...
research
12/07/2021

Robust Speech Representation Learning via Flow-based Embedding Regularization

Over the recent years, various deep learning-based methods were proposed...

Please sign up or login with your details

Forgot password? Click here to reset