Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances

04/04/2021
by   Chang Zeng, et al.
0

A back-end model is a key element of modern speaker verification systems. Probabilistic linear discriminant analysis (PLDA) has been widely used as a back-end model in speaker verification. However, it cannot fully make use of multiple utterances from enrollment speakers. In this paper, we propose a novel attention-based back-end model, which can be used for both text-independent (TI) and text-dependent (TD) speaker verification with multiple enrollment utterances, and employ scaled-dot self-attention and feed-forward self-attention networks as architectures that learn the intra-relationships of the enrollment utterances. In order to verify the proposed attention back-end, we combine it with two completely different but dominant speaker encoders, which are time delay neural network (TDNN) and ResNet trained using the additive-margin-based softmax loss and the uniform loss, and compare them with the conventional PLDA or cosine scoring approaches. Experimental results on a multi-genre dataset called CN-Celeb show that the performance of our proposed approach outperforms PLDA scoring with TDNN and cosine scoring with ResNet by around 14.1 experiment is also reported in this paper for examining the impact of some significant hyper-parameters for the proposed back-end model.

READ FULL TEXT

page 1

page 2

page 3

page 4

08/03/2020

Self-attention encoding and pooling for speaker recognition

The computing power of mobile devices limits the end-user applications i...
05/16/2020

The INTERSPEECH 2020 Far-Field Speaker Verification Challenge

The INTERSPEECH 2020 Far-Field Speaker Verification Challenge (FFSVC 202...
04/22/2022

Unifying Cosine and PLDA Back-ends for Speaker Verification

State-of-art speaker verification (SV) systems use a back-end model to s...
03/10/2022

Parameter-Free Attentive Scoring for Speaker Verification

This paper presents a novel study of parameter-free attentive scoring fo...
06/26/2018

Text-Independent Speaker Verification Based on Deep Neural Networks and Segmental Dynamic Time Warping

In this paper we present a new method for text-independent speaker verif...
04/08/2022

Scoring of Large-Margin Embeddings for Speaker Verification: Cosine or PLDA?

The emergence of large-margin softmax cross-entropy losses in training d...
11/25/2020

Vocal Tract Length Perturbation for Text-Dependent Speaker Verification with Autoregressive Prediction Coding

In this letter, we propose a vocal tract length (VTL) perturbation metho...