Double Multi-Head Attention for Speaker Verification

07/26/2020
by   Miquel India, et al.
0

Most state-of-the-art Deep Learning systems for speaker verification are based on speaker embedding extractors. These architectures are commonly composed of a feature extractor front-end together with a pooling layer to encode variable-length utterances into fixed-length speaker vectors. In this paper we present Double Multi-Head Attention pooling, which extends our previous approach based on Self Multi-Head Attention. An additional self attention layer is added to the pooling layer that summarizes the context vectors produced by Multi-Head Attention into a unique speaker representation. This method enhances the pooling mechanism by giving weights to the information captured for each head and it results in creating more discriminative speaker embeddings. We have evaluated our approach with the VoxCeleb2 dataset. Our results show 9.19% and 4.29% relative improvement in terms of EER compared to Self Attention pooling and Self Multi-Head Attention, respectively. According to the obtained results, Double Multi-Head Attention has shown to be an excellent approach to efficiently select the most relevant features captured by the CNN-based front-ends from the speech signal.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2019

Self Multi-Head Attention for Speaker Recognition

Most state-of-the-art Deep Learning (DL) approaches for speaker recognit...
research
08/21/2018

Exploring a Unified Attention-Based Pooling Framework for Speaker Verification

The pooling layer is an essential component in the neural network based ...
research
01/28/2020

Masked cross self-attention encoding for deep speaker embedding

In general, speaker verification tasks require the extraction of speaker...
research
10/11/2021

Multi-query multi-head attention pooling and Inter-topK penalty for speaker verification

This paper describes the multi-query multi-head attention (MQMHA) poolin...
research
05/20/2023

ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention

In this paper, we propose ACA-Net, a lightweight, global context-aware s...
research
08/03/2020

Self-attention encoding and pooling for speaker recognition

The computing power of mobile devices limits the end-user applications i...
research
11/06/2021

Class Token and Knowledge Distillation for Multi-head Self-Attention Speaker Verification Systems

This paper explores three novel approaches to improve the performance of...

Please sign up or login with your details

Forgot password? Click here to reset