Deep Normalization for Speaker Vectors

04/07/2020
by   Yunqi Cai, et al.
0

Deep speaker embedding has demonstrated state-of-the-art performance in audio speaker recognition (SRE). However, one potential issue with this approach is that the speaker vectors derived from deep embedding models tend to be non-Gaussian for each individual speaker, and non-homogeneous for distributions of different speakers. These irregular distributions can seriously impact SRE performance, especially with the popular PLDA scoring method, which assumes homogeneous Gaussian distribution. In this paper, we argue that deep speaker vectors require deep normalization, and propose a deep normalization approach based on a novel discriminative normalization flow (DNF) model. We demonstrate the effectiveness of the proposed approach with experiments using the widely used SITW and CNCeleb corpora. In these experiments, the DNF-based normalization delivered substantial performance gains and also showed strong generalization capability in out-of-domain tests.

READ FULL TEXT

page 4

page 5

research
10/30/2020

Deep Speaker Vector Normalization with Maximum Gaussianality Training

Deep speaker embedding represents the state-of-the-art technique for spe...
research
04/07/2019

VAE-based regularization for deep speaker embedding

Deep speaker embedding has achieved state-of-the-art performance in spea...
research
11/08/2018

Gaussian-Constrained training for speaker verification

Neural models, in particular the d-vector and x-vector architectures, ha...
research
03/24/2018

Fast variational Bayes for heavy-tailed PLDA applied to i-vectors and x-vectors

The standard state-of-the-art backend for text-independent speaker recog...
research
02/22/2016

Blind score normalization method for PLDA based speaker recognition

Probabilistic Linear Discriminant Analysis (PLDA) has become state-of-th...
research
05/25/2023

Ordered and Binary Speaker Embedding

Modern speaker recognition systems represent utterances by embedding vec...
research
08/03/2017

Recursive Whitening Transformation for Speaker Recognition on Language Mismatched Condition

Recently in speaker recognition, performance degradation due to the chan...

Please sign up or login with your details

Forgot password? Click here to reset