Dr-Vectors: Decision Residual Networks and an Improved Loss for Speaker Recognition

04/05/2021
by   Jason Pelecanos, et al.
0

Many neural network speaker recognition systems model each speaker using a fixed-dimensional embedding vector. These embeddings are generally compared using either linear or 2nd-order scoring and, until recently, do not handle utterance-specific uncertainty. In this work we propose scoring these representations in a way that can capture uncertainty, enroll/test asymmetry and additional non-linear information. This is achieved by incorporating a 2nd-stage neural network (known as a decision network) as part of an end-to-end training regimen. In particular, we propose the concept of decision residual networks which involves the use of a compact decision network to leverage cosine scores and to model the residual signal that's needed. Additionally, we present a modification to the generalized end-to-end softmax loss function to better target the separation of same/different speaker scores. We observed significant performance gains for the two techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/07/2018

Adapting End-to-End Neural Speaker Verification to New Languages and Recording Conditions with Adversarial Training

In this article we propose a novel approach for adapting speaker embeddi...
research
05/02/2018

End-to-End Residual CNN with L-GM Loss Speaker Verification System

We propose an end-to-end speaker verification system based on the neural...
research
09/01/2022

Joint Speaker Encoder and Neural Back-end Model for Fully End-to-End Automatic Speaker Verification with Multiple Enrollment Utterances

Conventional automatic speaker verification systems can usually be decom...
research
10/28/2017

Generalized End-to-End Loss for Speaker Verification

In this paper, we propose a new loss function called generalized end-to-...
research
09/06/2021

Improving Speaker Identification for Shared Devices by Adapting Embeddings to Speaker Subsets

Speaker identification typically involves three stages. First, a front-e...
research
02/23/2023

Incorporating Uncertainty from Speaker Embedding Estimation to Speaker Verification

Speech utterances recorded under differing conditions exhibit varying de...
research
02/19/2023

Probabilistic Back-ends for Online Speaker Recognition and Clustering

This paper focuses on multi-enrollment speaker recognition which natural...

Please sign up or login with your details

Forgot password? Click here to reset