Masked cross self-attention encoding for deep speaker embedding

01/28/2020
by   Soonshin Seo, et al.
0

In general, speaker verification tasks require the extraction of speaker embedding from a deep neural network. As speaker embedding may contain additional information such as noise besides speaker information, its variability controlling is needed. Our previous model have used multiple pooling based on shortcut connections to amplify speaker information by deepening the dimension; however, the problem of variability remains. In this paper, we propose a masked cross self-attention encoding (MCSAE) for deep speaker embedding. This method controls the variability of speaker embedding by focusing on each masked output of multiple pooling on each other. The output of the MCSAE is used to construct the deep speaker embedding. Experimental results on VoxCeleb data set demonstrate that the proposed approach improves performance as compared with previous state-of-the-art models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/26/2020

Double Multi-Head Attention for Speaker Verification

Most state-of-the-art Deep Learning systems for speaker verification are...
research
08/07/2020

Disentangled speaker and nuisance attribute embedding for robust speaker verification

Over the recent years, various deep learning-based embedding methods hav...
research
05/20/2023

ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention

In this paper, we propose ACA-Net, a lightweight, global context-aware s...
research
06/28/2022

Attention-based conditioning methods using variable frame rate for style-robust speaker verification

We propose an approach to extract speaker embeddings that are robust to ...
research
07/27/2020

Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Normalization for End-to-End Speaker Verification System

One of the most important parts of an end-to-end speaker verification sy...
research
09/24/2019

Improving Robustness In Speaker Identification Using A Two-Stage Attention Model

In this paper a novel framework to tackle speaker recognition using a tw...
research
08/03/2020

Self-attention encoding and pooling for speaker recognition

The computing power of mobile devices limits the end-user applications i...

Please sign up or login with your details

Forgot password? Click here to reset