Class Token and Knowledge Distillation for Multi-head Self-Attention Speaker Verification Systems

11/06/2021
by   Victoria Mingote, et al.
12

This paper explores three novel approaches to improve the performance of speaker verification (SV) systems based on deep neural networks (DNN) using Multi-head Self-Attention (MSA) mechanisms and memory layers. Firstly, we propose the use of a learnable vector called Class token to replace the average global pooling mechanism to extract the embeddings. Unlike global average pooling, our proposal takes into account the temporal structure of the input what is relevant for the text-dependent SV task. The class token is concatenated to the input before the first MSA layer, and its state at the output is used to predict the classes. To gain additional robustness, we introduce two approaches. First, we have developed a Bayesian estimation of the class token. Second, we have added a distilled representation token for training a teacher-student pair of networks using the Knowledge Distillation (KD) philosophy, which is combined with the class token. This distillation token is trained to mimic the predictions from the teacher network, while the class token replicates the true label. All the strategies have been tested on the RSR2015-Part II and DeepMine-Part 1 databases for text-dependent SV, providing competitive results compared to the same architecture using the average pooling mechanism to extract average embeddings.

READ FULL TEXT

page 1

page 3

page 5

page 7

page 8

research
01/31/2019

Optimization of the Area Under the ROC Curve using Neural Network Supervectors for Text-Dependent Speaker Verification

This paper explores two techniques to improve the performance of text-de...
research
07/26/2020

Double Multi-Head Attention for Speaker Verification

Most state-of-the-art Deep Learning systems for speaker verification are...
research
06/20/2023

Knowledge Distillation via Token-level Relationship Graph

Knowledge distillation is a powerful technique for transferring knowledg...
research
01/13/2022

SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation

Feature regression is a simple way to distill large neural network model...
research
03/28/2019

Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification

In this paper, gating mechanisms are applied in deep neural network (DNN...
research
07/04/2019

Graph-based Knowledge Distillation by Multi-head Self-attention Network

Knowledge distillation (KD) is a technique to derive optimal performance...

Please sign up or login with your details

Forgot password? Click here to reset