An Improved Deep Neural Network for Modeling Speaker Characteristics at Different Temporal Scales

01/14/2020
by   Bin Gu, et al.
0

This paper presents an improved deep embedding learning method based on convolutional neural network (CNN) for text-independent speaker verification. Two improvements are proposed for x-vector embedding learning: (1) Multi-scale convolution (MSCNN) is adopted in frame-level layers to capture complementary speaker information in different receptive fields. (2) A Baum-Welch statistics attention (BWSA) mechanism is applied in pooling-layer, which can integrate more useful long-term speaker characteristics in the temporal pooling layer. Experiments are carried out on the NIST SRE16 evaluation set. The results demonstrate the effectiveness of MSCNN and show the proposed BWSA can further improve the performance of the DNN embedding system

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/28/2019

Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification

In this paper, gating mechanisms are applied in deep neural network (DNN...
research
02/21/2019

Deep Speaker Embedding Learning with Multi-Level Pooling for Text-Independent Speaker Verification

This paper aims to improve the widely used deep speaker embedding x-vect...
research
08/30/2021

RSKNet-MTSP: Effective and Portable Deep Architecture for Speaker Verification

The convolutional neural network (CNN) based approaches have shown great...
research
05/10/2021

Study on the temporal pooling used in deep neural networks for speaker verification

The x-vector architecture has recently achieved state-of-the-art results...
research
10/16/2019

Frequency and temporal convolutional attention for text-independent speaker recognition

Majority of the recent approaches for text-independent speaker recogniti...
research
08/12/2021

Xi-Vector Embedding for Speaker Recognition

We present a Bayesian formulation for deep speaker embedding, wherein th...
research
06/25/2021

Phoneme-aware and Channel-wise Attentive Learning for Text DependentSpeaker Verification

This paper proposes a multi-task learning network with phoneme-aware and...

Please sign up or login with your details

Forgot password? Click here to reset