Full-info Training for Deep Speaker Feature Learning

10/31/2017
by   Lantian Li, et al.
0

In recent studies, it has shown that speaker patterns can be learned from very short speech segments (e.g., 0.3 seconds) by a carefully designed convolutional & time-delay deep neural network (CT-DNN) model. By enforcing the model to discriminate the speakers in the training data, frame-level speaker features can be derived from the last hidden layer. In spite of its good performance, a potential problem of the present model is that it involves a parametric classifier, i.e., the last affine layer, which may consume some discriminative knowledge, thus leading to `information leak' for the feature learning. This paper presents a full-info training approach that discards the parametric classifier and enforces all the discriminative knowledge learned by the feature net. Our experiments on the Fisher database demonstrate that this new training scheme can produce more coherent features, leading to consistent and notable performance improvement on the speaker verification task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2017

Deep Speaker Feature Learning for Text-independent Speaker Verification

Recently deep neural networks (DNNs) have been used to learn speaker fea...
research
06/22/2017

Deep Speaker Verification: Do We Need End to End?

End-to-end learning treats the entire system as a whole adaptable black ...
research
11/15/2017

Human and Machine Speaker Recognition Based on Short Trivial Events

Trivial events are ubiquitous in human to human conversations, e.g., cou...
research
06/06/2014

Unsupervised Feature Learning through Divergent Discriminative Feature Accumulation

Unlike unsupervised approaches such as autoencoders that learn to recons...
research
11/08/2018

Gaussian-Constrained training for speaker verification

Neural models, in particular the d-vector and x-vector architectures, ha...
research
06/22/2017

Speaker Recognition with Cough, Laugh and "Wei"

This paper proposes a speaker recognition (SRE) task with trivial speech...
research
04/07/2021

Siamese Neural Network with Joint Bayesian Model Structure for Speaker Verification

Generative probability models are widely used for speaker verification (...

Please sign up or login with your details

Forgot password? Click here to reset