Differentiable Supervector Extraction for Encoding Speaker and Phrase Information in Text Dependent Speaker Verification

12/22/2018
by   Victoria Mingote, et al.
0

In this paper, we propose a new differentiable neural network alignment mechanism for text-dependent speaker verification which uses alignment models to produce a supervector representation of an utterance. Unlike previous works with similar approaches, we do not extract the embedding of an utterance from the mean reduction of the temporal dimension. Our system replaces the mean by a phrase alignment model to keep the temporal structure of each phrase which is relevant in this application since the phonetic information is part of the identity in the verification task. Moreover, we can apply a convolutional neural network as front-end, and thanks to the alignment process being differentiable, we can train the whole network to produce a supervector for each utterance which will be discriminative with respect to the speaker and the phrase simultaneously. As we show, this choice has the advantage that the supervector encodes the phrase and speaker information providing good performance in text-dependent speaker verification tasks. In this work, the process of verification is performed using a basic similarity metric, due to simplicity, compared to other more elaborate models that are commonly used. The new model using alignment to produce supervectors was tested on the RSR2015-Part I database for text-dependent speaker verification, providing competitive results compared to similar size networks using the mean to extract embeddings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2019

Optimization of the Area Under the ROC Curve using Neural Network Supervectors for Text-Dependent Speaker Verification

This paper explores two techniques to improve the performance of text-de...
research
09/28/2018

Spoken Pass-Phrase Verification in the i-vector Space

The task of spoken pass-phrase verification is to decide whether a test ...
research
01/03/2017

End-to-End Attention based Text-Dependent Speaker Verification

A new type of End-to-End system for text-dependent speaker verification ...
research
11/19/2016

Incorporating Pass-Phrase Dependent Background Models for Text-Dependent Speaker Verification

In this paper, we propose pass-phrase dependent background models (PBMs)...
research
10/22/2020

Robust Text-Dependent Speaker Verification via Character-Level Information Preservation for the SdSV Challenge 2020

This paper describes our submission to Task 1 of the Short-duration Spea...
research
07/14/2017

Comparison of Multiple Features and Modeling Methods for Text-dependent Speaker Verification

Text-dependent speaker verification is becoming popular in the speaker r...
research
03/28/2018

Handling Verb Phrase Anaphora with Dependent Types and Events

This paper studies how dependent typed events can be used to treat verb ...

Please sign up or login with your details

Forgot password? Click here to reset