Deep neural network based i-vector mapping for speaker verification using short utterances

10/16/2018
by   Jinxi Guo, et al.
0

Text-independent speaker recognition using short utterances is a highly challenging task due to the large variation and content mismatch between short utterances. I-vector based systems have become the standard in speaker verification applications, but they are less effective with short utterances. In this paper, we first compare two state-of-the-art universal background model training methods for i-vector modeling using full-length and short utterance evaluation tasks. The two methods are Gaussian mixture model (GMM) based and deep neural network (DNN) based methods. The results indicate that the I-vector_DNN system outperforms the I-vector_GMM system under various durations. However, the performances of both systems degrade significantly as the duration of the utterances decreases. To address this issue, we propose two novel nonlinear mapping methods which train DNN models to map the i-vectors extracted from short utterances to their corresponding long-utterance i-vectors. The mapped i-vector can restore missing information and reduce the variance of the original short-utterance i-vectors. The proposed methods both model the joint representation of short and long utterance i-vectors by using autoencoder. Experimental results using the NIST SRE 2010 dataset show that both methods provide significant improvement and result in a max of 28.43 relative improvement in Equal Error Rates from a baseline system, when using deep encoder with residual blocks and adding an additional phoneme vector. When further testing the best-validated models of SRE10 on the Speaker In The Wild dataset, the methods result in a 23.12 s) short-utterance conditions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2019

Quality Measures for Speaker Verification with Short Utterances

The performances of the automatic speaker verification (ASV) systems deg...
research
09/17/2018

Generative x-vectors for text-independent speaker verification

Speaker verification (SV) systems using deep neural network embeddings, ...
research
04/16/2019

Spoof detection using x-vector and feature switching

Detecting spoofed utterances is a fundamental problem in voice-based bio...
research
02/14/2020

Deep Speaker Embeddings for Far-Field Speaker Recognition on Short Utterances

Speaker recognition systems based on deep speaker embeddings have achiev...
research
04/06/2020

Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs

In realistic settings, a speaker recognition system needs to identify a ...
research
04/08/2020

Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker Verification

Speaker verification systems usually suffer from the mismatch problem be...
research
09/29/2017

Language-depedent I-Vectors for LRE15

A standard recipe for spoken language recognition is to apply a Gaussian...

Please sign up or login with your details

Forgot password? Click here to reset