Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization

07/15/2020
by   Jenthe Thienpondt, et al.
0

In this paper we describe the top-scoring IDLab submission for the text-independent task of the Short-duration Speaker Verification (SdSV) Challenge 2020. The main difficulty of the challenge exists in the large degree of varying phonetic overlap between the potentially cross-lingual trials, along with the limited availability of in-domain DeepMine Farsi training data. We introduce domain-balanced hard prototype mining to fine-tune the state-of-the-art ECAPA-TDNN x-vector based speaker embedding extractor. The sample mining technique efficiently exploits speaker distances between the speaker prototypes of the popular AAM-softmax loss function to construct challenging training batches that are balanced on the domain-level. To enhance the scoring of cross-lingual trials, we propose a language-dependent s-norm score normalization. The imposter cohort only contains data from the Farsi target-domain which simulates the enrollment data always being Farsi. In case a Gaussian-Backend language model detects the test speaker embedding to contain English, a cross-language compensation offset determined on the AAM-softmax speaker prototypes is subtracted from the maximum expected imposter mean score. A fusion of five systems with minor topological tweaks resulted in a final MinDCF and EER of 0.065 and 1.45

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2021

Tackling the Score Shift in Cross-Lingual Speaker Verification by Exploiting Language Information

This paper contains a post-challenge performance analysis on cross-lingu...
research
08/05/2019

Cross-lingual Text-independent Speaker Verification using Unsupervised Adversarial Discriminative Domain Adaptation

Speaker verification systems often degrade significantly when there is a...
research
02/28/2023

CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis

While recent text-to-speech (TTS) systems have made remarkable strides t...
research
06/24/2022

SANE-TTS: Stable And Natural End-to-End Multilingual Text-to-Speech

In this paper, we present SANE-TTS, a stable and natural end-to-end mult...
research
09/09/2021

The IDLAB VoxCeleb Speaker Recognition Challenge 2021 System Description

This technical report describes the IDLab submission for track 1 and 2 o...
research
09/10/2022

Pay Attention to Hard Trials

Performance of speaker recognition systems is evaluated on test trials. ...
research
10/30/2020

Deep Speaker Vector Normalization with Maximum Gaussianality Training

Deep speaker embedding represents the state-of-the-art technique for spe...

Please sign up or login with your details

Forgot password? Click here to reset