Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances

04/04/2021
by   Chang Zeng, et al.
0

A back-end model is a key element of modern speaker verification systems. Probabilistic linear discriminant analysis (PLDA) has been widely used as a back-end model in speaker verification. However, it cannot fully make use of multiple utterances from enrollment speakers. In this paper, we propose a novel attention-based back-end model, which can be used for both text-independent (TI) and text-dependent (TD) speaker verification with multiple enrollment utterances, and employ scaled-dot self-attention and feed-forward self-attention networks as architectures that learn the intra-relationships of the enrollment utterances. In order to verify the proposed attention back-end, we combine it with two completely different but dominant speaker encoders, which are time delay neural network (TDNN) and ResNet trained using the additive-margin-based softmax loss and the uniform loss, and compare them with the conventional PLDA or cosine scoring approaches. Experimental results on a multi-genre dataset called CN-Celeb show that the performance of our proposed approach outperforms PLDA scoring with TDNN and cosine scoring with ResNet by around 14.1 experiment is also reported in this paper for examining the impact of some significant hyper-parameters for the proposed back-end model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2022

Joint Speaker Encoder and Neural Back-end Model for Fully End-to-End Automatic Speaker Verification with Multiple Enrollment Utterances

Conventional automatic speaker verification systems can usually be decom...
research
08/03/2020

Self-attention encoding and pooling for speaker recognition

The computing power of mobile devices limits the end-user applications i...
research
05/16/2020

The INTERSPEECH 2020 Far-Field Speaker Verification Challenge

The INTERSPEECH 2020 Far-Field Speaker Verification Challenge (FFSVC 202...
research
03/10/2022

Parameter-Free Attentive Scoring for Speaker Verification

This paper presents a novel study of parameter-free attentive scoring fo...
research
06/26/2018

Text-Independent Speaker Verification Based on Deep Neural Networks and Segmental Dynamic Time Warping

In this paper we present a new method for text-independent speaker verif...
research
02/19/2023

Probabilistic Back-ends for Online Speaker Recognition and Clustering

This paper focuses on multi-enrollment speaker recognition which natural...
research
11/08/2018

Phonetic-attention scoring for deep speaker features in speaker verification

Recent studies have shown that frame-level deep speaker features can be ...

Please sign up or login with your details

Forgot password? Click here to reset