Siamese Neural Network with Joint Bayesian Model Structure for Speaker Verification

by   Xugang Lu, et al.

Generative probability models are widely used for speaker verification (SV). However, the generative models are lack of discriminative feature selection ability. As a hypothesis test, the SV can be regarded as a binary classification task which can be designed as a Siamese neural network (SiamNN) with discriminative training. However, in most of the discriminative training for SiamNN, only the distribution of pair-wised sample distances is considered, and the additional discriminative information in joint distribution of samples is ignored. In this paper, we propose a novel SiamNN with consideration of the joint distribution of samples. The joint distribution of samples is first formulated based on a joint Bayesian (JB) based generative model, then a SiamNN is designed with dense layers to approximate the factorized affine transforms as used in the JB model. By initializing the SiamNN with the learned model parameters of the JB model, we further train the model parameters with the pair-wised samples as a binary discrimination task for SV. We carried out SV experiments on data corpus of speakers in the wild (SITW) and VoxCeleb. Experimental results showed that our proposed model improved the performance with a large margin compared with state of the art models for SV.


page 1

page 2

page 3

page 4


Integrating a joint Bayesian generative model in a discriminative learning framework for speaker verification

The task for speaker verification (SV) is to decide an utterance is spok...

NPLDA: A Deep Neural PLDA Model for Speaker Verification

The state-of-art approach for speaker verification consists of a neural ...

Pairwise Discriminative Neural PLDA for Speaker Verification

The state-of-art approach to speaker verification involves the extractio...

Large Margin Softmax Loss for Speaker Verification

In neural network based speaker verification, speaker embedding is expec...

Full-info Training for Deep Speaker Feature Learning

In recent studies, it has shown that speaker patterns can be learned fro...

A Double Joint Bayesian Approach for J-Vector Based Text-dependent Speaker Verification

J-vector has been proved to be very effective in text-dependent speaker ...

An Iterative Closest Points Approach to Neural Generative Models

We present a simple way to learn a transformation that maps samples of o...