Improved Meta-learning training for Speaker Verification

03/29/2021
by   Yafeng Chen, et al.
0

Meta-learning (ML) has recently become a research hotspot in speaker verification (SV). We introduce two methods to improve the meta-learning training for SV in this paper. For the first method, a backbone embedding network is first jointly trained with the conventional cross entropy loss and prototypical networks (PN) loss. Then, inspired by speaker adaptive training in speech recognition, additional transformation coefficients are trained with only the PN loss. The transformation coefficients are used to modify the original backbone embedding network in the x-vector extraction process. Furthermore, the random erasing (RE) data augmentation technique is applied to all support samples in each episode to construct positive pairs, and a contrastive loss between the augmented and the original support samples is added to the objective in model training. Experiments are carried out on the Speaker in the Wild (SITW) and VOiCES databases. Both of the methods can obtain consistent improvements over existing meta-learning training frameworks. By combining these two methods, we can observe further improvements on these two databases.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2020

Multi-task Metric Learning for Text-independent Speaker Verification

In this work, we introduce metric learning (ML) to enhance the deep embe...
research
03/31/2022

Improved Relation Networks for End-to-End Speaker Verification and Identification

Speaker identification systems in a real-world scenario are tasked to id...
research
10/23/2019

Speaker Adaptive Training using Model Agnostic Meta-Learning

Speaker adaptive training (SAT) of neural network acoustic models learns...
research
07/31/2020

Designing Neural Speaker Embeddings with Meta Learning

Neural speaker embeddings trained using classification objectives have d...
research
04/06/2020

Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs

In realistic settings, a speaker recognition system needs to identify a ...
research
10/24/2019

Meta-learning for robust child-adult classification from speech

Computational modeling of naturalistic conversations in clinical applica...
research
02/04/2022

Distribution Embedding Networks for Meta-Learning with Heterogeneous Covariate Spaces

We propose Distribution Embedding Networks (DEN) for classification with...

Please sign up or login with your details

Forgot password? Click here to reset