Deep Speaker Embedding Learning with Multi-Level Pooling for Text-Independent Speaker Verification

02/21/2019
by   Yun Tang, et al.
0

This paper aims to improve the widely used deep speaker embedding x-vector model. We propose the following improvements: (1) a hybrid neural network structure using both time delay neural network (TDNN) and long short-term memory neural networks (LSTM) to generate complementary speaker information at different levels; (2) a multi-level pooling strategy to collect speaker information from both TDNN and LSTM layers; (3) a regularization scheme on the speaker embedding extraction layer to make the extracted embeddings suitable for the following fusion step. The synergy of these improvements are shown on the NIST SRE 2016 eval test (with a 19 (with a 9 these two test sets over the x-vector baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/14/2020

An Improved Deep Neural Network for Modeling Speaker Characteristics at Different Temporal Scales

This paper presents an improved deep embedding learning method based on ...
research
03/28/2019

Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification

In this paper, gating mechanisms are applied in deep neural network (DNN...
research
08/12/2021

Xi-Vector Embedding for Speaker Recognition

We present a Bayesian formulation for deep speaker embedding, wherein th...
research
08/14/2020

End-to-End Trainable Self-Attentive Shallow Network for Text-Independent Speaker Verification

Generalized end-to-end (GE2E) model is widely used in speaker verificati...
research
09/14/2016

TristouNet: Triplet Loss for Speaker Turn Embedding

TristouNet is a neural network architecture based on Long Short-Term Mem...
research
06/23/2022

DeepSafety:Multi-level Audio-Text Feature Extraction and Fusion Approach for Violence Detection in Conversations

Natural Language Processing has recently made understanding human intera...
research
06/19/2019

Spatial Pyramid Encoding with Convex Length Normalization for Text-Independent Speaker Verification

In this paper, we propose a new pooling method called spatial pyramid en...

Please sign up or login with your details

Forgot password? Click here to reset