Optimization of the Area Under the ROC Curve using Neural Network Supervectors for Text-Dependent Speaker Verification

01/31/2019
by   Victoria Mingote, et al.
0

This paper explores two techniques to improve the performance of text-dependent speaker verification systems based on deep neural networks. Firstly, we propose a general alignment mechanism to keep the temporal structure of each phrase and obtain a supervector with the speaker and phrase information, since both are relevant for a text-dependent verification. As we show, it is possible to use different alignment techniques to replace the average pooling providing significant gains in performance. Moreover, we present a novel back-end approach to train a neural network for detection tasks by optimizing the Area Under the Curve (AUC) as an alternative to the usual triplet loss function, so the system is end-to-end, with a cost function closed to our desired measure of performance. As we can see in the experimental section, this approach improves the system performance, since our triplet AUC neural network learns how to discriminate between pairs of examples from the same identity and pairs of different identities. The different alignment techniques to produce supervectors in addition to the new back-end approach were tested on the RSR2015-Part I database for text-dependent speaker verification, providing competitive results compared to similar size networks using the average pooling to extract supervectors and using a simple back-end or triplet loss training.

READ FULL TEXT
research
12/22/2018

Differentiable Supervector Extraction for Encoding Speaker and Phrase Information in Text Dependent Speaker Verification

In this paper, we propose a new differentiable neural network alignment ...
research
11/06/2021

Class Token and Knowledge Distillation for Multi-head Self-Attention Speaker Verification Systems

This paper explores three novel approaches to improve the performance of...
research
08/06/2019

An End-to-End Text-independent Speaker Verification Framework with a Keyword Adversarial Network

This paper presents an end-to-end text-independent speaker verification ...
research
10/22/2020

Robust Text-Dependent Speaker Verification via Character-Level Information Preservation for the SdSV Challenge 2020

This paper describes our submission to Task 1 of the Short-duration Spea...
research
08/06/2019

Triplet Based Embedding Distance and Similarity Learning for Text-independent Speaker Verification

Speaker embeddings become growing popular in the text-independent speake...
research
05/10/2021

Study on the temporal pooling used in deep neural networks for speaker verification

The x-vector architecture has recently achieved state-of-the-art results...
research
12/27/2018

Tied Hidden Factors in Neural Networks for End-to-End Speaker Recognition

In this paper we propose a method to model speaker and session variabili...

Please sign up or login with your details

Forgot password? Click here to reset