End-to-End Attention based Text-Dependent Speaker Verification

01/03/2017
by   Shi-Xiong Zhang, et al.
0

A new type of End-to-End system for text-dependent speaker verification is presented in this paper. Previously, using the phonetically discriminative/speaker discriminative DNNs as feature extractors for speaker verification has shown promising results. The extracted frame-level (DNN bottleneck, posterior or d-vector) features are equally weighted and aggregated to compute an utterance-level speaker representation (d-vector or i-vector). In this work we use speaker discriminative CNNs to extract the noise-robust frame-level features. These features are smartly combined to form an utterance-level speaker vector through an attention mechanism. The proposed attention model takes the speaker discriminative information and the phonetic information to learn the weights. The whole system, including the CNN and attention model, is joint optimized using an end-to-end criterion. The training algorithm imitates exactly the evaluation process --- directly mapping a test utterance and a few target speaker utterances into a single verification score. The algorithm can automatically select the most similar impostor for each target speaker to train the network. We demonstrated the effectiveness of the proposed end-to-end system on Windows 10 "Hey Cortana" speaker verification task.

READ FULL TEXT
research
05/26/2017

Text-Independent Speaker Verification Using 3D Convolutional Neural Networks

In this paper, a novel method using 3D Convolutional Neural Network (3D-...
research
08/20/2020

Speaker-Utterance Dual Attention for Speaker and Utterance Verification

In this paper, we study a novel technique that exploits the interaction ...
research
11/08/2018

Phonetic-attention scoring for deep speaker features in speaker verification

Recent studies have shown that frame-level deep speaker features can be ...
research
11/10/2020

Supervised attention for speaker recognition

The recently proposed self-attentive pooling (SAP) has shown good perfor...
research
10/06/2017

End-to-end DNN Based Speaker Recognition Inspired by i-vector and PLDA

Recently several end-to-end speaker verification systems based on deep n...
research
09/01/2022

Joint Speaker Encoder and Neural Back-end Model for Fully End-to-End Automatic Speaker Verification with Multiple Enrollment Utterances

Conventional automatic speaker verification systems can usually be decom...
research
12/22/2018

Differentiable Supervector Extraction for Encoding Speaker and Phrase Information in Text Dependent Speaker Verification

In this paper, we propose a new differentiable neural network alignment ...

Please sign up or login with your details

Forgot password? Click here to reset