Back-ends Selection for Deep Speaker Embeddings

04/25/2022
by   Zhuo Li, et al.
0

Probabilistic Linear Discriminant Analysis (PLDA) was the dominant and necessary back-end for early speaker recognition approaches, like i-vector and x-vector. However, with the development of neural networks and margin-based loss functions, we can obtain deep speaker embeddings (DSEs), which have advantages of increased inter-class separation and smaller intra-class distances. In this case, PLDA seems unnecessary or even counterproductive for the discriminative embeddings, and cosine similarity scoring (Cos) achieves better performance than PLDA in some situations. Motivated by this, in this paper, we systematically explore how to select back-ends (Cos or PLDA) for deep speaker embeddings to achieve better performance in different situations. By analyzing PLDA and the properties of DSEs extracted from models with different numbers of segment-level layers, we make the conjecture that Cos is better in same-domain situations and PLDA is better in cross-domain situations. We conduct experiments on VoxCeleb and NIST SRE datasets in four application situations, single-/multi-domain training and same-/cross-domain test, to validate our conjecture and briefly explain why back-ends adaption algorithms work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2019

Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

Recently, speaker embeddings extracted from a speaker discriminative dee...
research
04/22/2022

Unifying Cosine and PLDA Back-ends for Speaker Verification

State-of-art speaker verification (SV) systems use a back-end model to s...
research
04/08/2022

Scoring of Large-Margin Embeddings for Speaker Verification: Cosine or PLDA?

The emergence of large-margin softmax cross-entropy losses in training d...
research
08/17/2023

Graph Neural Network Backend for Speaker Recognition

Currently, most speaker recognition backends, such as cosine, linear dis...
research
10/27/2020

Squeezing value of cross-domain labels: a decoupled scoring approach for speaker verification

Domain mismatch often occurs in real applications and causes serious per...
research
03/28/2022

Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings

In speaker recognition, where speech segments are mapped to embeddings o...
research
07/06/2023

DSARSR: Deep Stacked Auto-encoders Enhanced Robust Speaker Recognition

Speaker recognition is a biometric modality that utilizes the speaker's ...

Please sign up or login with your details

Forgot password? Click here to reset