Investigation of Frame Alignments for GMM-based Text-prompted Speaker Verification

10/28/2017
by   YI LIU, et al.
0

The frame alignment acts as an important role in GMM-based speaker verification. In text-prompted speaker verification, it is common practice to use the transcriptions to align speech frames to phonetic units. In this paper, we compare the performance of alignments from hidden Markov model (HMM) and deep neural network (DNN), using the same training data and phonetic units. We incorporate a phonetic Gaussian mixture model (PGMM) in the DNN-based alignment, making the total number of Gaussian mixtures equal to that of the HMM. Based on the Kullback-Leibler divergence (KLD) between the HMM and DNN alignments, we also present a fast and efficient way to verify the text content. Our experiments on RSR2015 Part-3 show that, even with a small training set, the DNN-based alignment outperforms HMM alignment. Results also show that it is not effective to utilize the DNN posteriors in the HMM system. This phenomenon illustrates that under clean conditions (e.g., RSR2015), text-prompted speaker verification does not benefit from the use of transcriptions. However, the prompted text can be used as pass-phrase to enhance the security. The content verification experiment demonstrates the effectiveness of our proposed KLD-based method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/26/2021

Novel Hybrid DNN Approaches for Speaker Verification in Emotional and Stressful Talking Environments

In this work, we conducted an empirical comparative study of the perform...
research
02/03/2021

Data Generation Using Pass-phrase-dependent Deep Auto-encoders for Text-Dependent Speaker Verification

In this paper, we propose a novel method that trains pass-phrase specifi...
research
09/28/2018

Spoken Pass-Phrase Verification in the i-vector Space

The task of spoken pass-phrase verification is to decide whether a test ...
research
10/11/2018

Novel Cascaded Gaussian Mixture Model-Deep Neural Network Classifier for Speaker Identification in Emotional Talking Environments

This research is an effort to present an effective approach to enhance t...
research
09/27/2016

Decision Making Based on Cohort Scores for Speaker Verification

Decision making is an important component in a speaker verification syst...
research
10/11/2016

GMM-Free Flat Start Sequence-Discriminative DNN Training

Recently, attempts have been made to remove Gaussian mixture models (GMM...
research
05/31/2020

Crossed-Time Delay Neural Network for Speaker Recognition

Time Delay Neural Network (TDNN) is a well-performing structure for DNN-...

Please sign up or login with your details

Forgot password? Click here to reset