DeepAI AI Chat
Log In Sign Up

A Speaker Verification Backend with Robust Performance across Conditions

by   Luciana Ferrer, et al.

In this paper, we address the problem of speaker verification in conditions unseen or unknown during development. A standard method for speaker verification consists of extracting speaker embeddings with a deep neural network and processing them through a backend composed of probabilistic linear discriminant analysis (PLDA) and global logistic regression score calibration. This method is known to result in systems that work poorly on conditions different from those used to train the calibration model. We propose to modify the standard backend, introducing an adaptive calibrator that uses duration and other automatically extracted side-information to adapt to the conditions of the inputs. The backend is trained discriminatively to optimize binary cross-entropy. When trained on a number of diverse datasets that are labeled only with respect to speaker, the proposed backend consistently and, in some cases, dramatically improves calibration, compared to the standard PLDA approach, on a number of held-out datasets, some of which are markedly different from the training data. Discrimination performance is also consistently improved. We show that joint training of the PLDA and the adaptive calibrator is essential – the same benefits cannot be achieved when freezing PLDA and fine-tuning the calibrator. To our knowledge, the results in this paper are the first evidence in the literature that it is possible to develop a speaker verification system with robust out-of-the-box performance on a large variety of conditions.


page 32

page 40

page 41


A discriminative condition-aware backend for speaker verification

We present a scoring approach for speaker verification that mimics the s...

Investigation of Different Calibration Methods for Deep Speaker Embedding based Verification Systems

Deep speaker embedding extractors have already become new state-of-the-a...

A Speaker Verification Backend for Improved Calibration Performance across Varying Conditions

In a recent work, we presented a discriminative backend for speaker veri...

Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions

The performance of speaker verification systems degrades when vocal effo...

STC speaker recognition systems for the NIST SRE 2021

This paper presents a description of STC Ltd. systems submitted to the N...

The IDLAB VoxCeleb Speaker Recognition Challenge 2021 System Description

This technical report describes the IDLab submission for track 1 and 2 o...