Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Normalization for End-to-End Speaker Verification System

07/27/2020
by   Soonshin Seo, et al.
0

One of the most important parts of an end-to-end speaker verification system is the speaker embedding generation. In our previous paper, we reported that shortcut connections-based multi-layer aggregation improves the representational power of the speaker embedding. However, the number of model parameters is relatively large and the unspecified variations increase in the multi-layer aggregation. Therefore, we propose a self-attentive multi-layer aggregation with feature recalibration and normalization for end-to-end speaker verification system. To reduce the number of model parameters, the ResNet, which scaled channel width and layer depth, is used as a baseline. To control the variability in the training, a self-attention mechanism is applied to perform the multi-layer aggregation with dropout regularizations and batch normalizations. Then, a feature recalibration layer is applied to the aggregated feature using fully-connected layers and nonlinear activation functions. Deep length normalization is also used on a recalibrated feature in the end-to-end training process. Experimental results using the VoxCeleb1 evaluation dataset showed that the performance of the proposed methods was comparable to that of state-of-the-art models (equal error rate of 4.95 2.86

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/03/2022

Dynamic Kernels and Channel Attention with Multi-Layer Embedding Aggregation for Speaker Verification

State-of-the-art speaker verification frameworks have typically focused ...
research
07/14/2021

Serialized Multi-Layer Multi-Head Attention for Neural Speaker Embedding

This paper proposes a serialized multi-layer multi-head attention for ne...
research
06/08/2018

Analysis of Length Normalization in End-to-End Speaker Verification System

The classical i-vectors and the latest end-to-end deep speaker embedding...
research
01/28/2020

Masked cross self-attention encoding for deep speaker embedding

In general, speaker verification tasks require the extraction of speaker...
research
06/26/2022

Transport-Oriented Feature Aggregation for Speaker Embedding Learning

Pooling is needed to aggregate frame-level features into utterance-level...
research
05/20/2023

ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention

In this paper, we propose ACA-Net, a lightweight, global context-aware s...
research
06/04/2020

Online End-to-End Neural Diarization with Speaker-Tracing Buffer

End-to-end speaker diarization using a fully supervised self-attention m...

Please sign up or login with your details

Forgot password? Click here to reset