Analysis of Length Normalization in End-to-End Speaker Verification System

06/08/2018
by   Weicheng Cai, et al.
0

The classical i-vectors and the latest end-to-end deep speaker embeddings are the two representative categories of utterance-level representations in automatic speaker verification systems. Traditionally, once i-vectors or deep speaker embeddings are extracted, we rely on an extra length normalization step to normalize the representations into unit-length hyperspace before back-end modeling. In this paper, we explore how the neural network learn length-normalized deep speaker embeddings in an end-to-end manner. To this end, we add a length normalization layer followed by a scale layer before the output layer of the common classification network. We conducted experiments on the verification task of the Voxceleb1 dataset. The results show that integrating this simple step in the end-to-end training pipeline significantly boosts the performance of speaker verification. In the testing stage of our L2-normalized end-to-end system, a simple inner-product can achieve the state-of-the-art.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/20/2020

Pairwise Discriminative Neural PLDA for Speaker Verification

The state-of-art approach to speaker verification involves the extractio...
research
04/14/2018

Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System

In this paper, we explore the encoding/pooling layer and loss function i...
research
02/27/2018

Gaussian meta-embeddings for efficient scoring of a heavy-tailed PLDA model

Embeddings in machine learning are low-dimensional representations of co...
research
04/17/2019

RawNet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification

Recently, direct modeling of raw waveforms using deep neural networks ha...
research
06/19/2019

Spatial Pyramid Encoding with Convex Length Normalization for Text-Independent Speaker Verification

In this paper, we propose a new pooling method called spatial pyramid en...
research
07/27/2020

Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Normalization for End-to-End Speaker Verification System

One of the most important parts of an end-to-end speaker verification sy...
research
09/28/2020

Siamese Capsule Network for End-to-End Speaker Recognition In The Wild

We propose an end-to-end deep model for speaker verification in the wild...

Please sign up or login with your details

Forgot password? Click here to reset