Self-supervised Speaker Recognition Training Using Human-Machine Dialogues

02/07/2022
by   Metehan Cekic, et al.
0

Speaker recognition, recognizing speaker identities based on voice alone, enables important downstream applications, such as personalization and authentication. Learning speaker representations, in the context of supervised learning, heavily depends on both clean and sufficient labeled data, which is always difficult to acquire. Noisy unlabeled data, on the other hand, also provides valuable information that can be exploited using self-supervised training methods. In this work, we investigate how to pretrain speaker recognition models by leveraging dialogues between customers and smart-speaker devices. However, the supervisory information in such dialogues is inherently noisy, as multiple speakers may speak to a device in the course of the same dialogue. To address this issue, we propose an effective rejection mechanism that selectively learns from dialogues based on their acoustic homogeneity. Both reconstruction-based and contrastive-learning-based self-supervised methods are compared. Experiments demonstrate that the proposed method provides significant performance improvements, superior to earlier work. Dialogue pretraining when combined with the rejection mechanism yields 27.10 error rate (EER) reduction in speaker recognition, compared to a model without self-supervised pretraining.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/27/2022

Wav2Vec-Aug: Improved self-supervised training with limited data

Self-supervised learning (SSL) of speech representations has received mu...
research
09/21/2023

Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition

Current speaker recognition systems primarily rely on supervised approac...
research
02/03/2023

SPADE: Self-supervised Pretraining for Acoustic DisEntanglement

Self-supervised representation learning approaches have grown in popular...
research
11/06/2020

Self-Supervised Learning from Contrastive Mixtures for Personalized Speech Enhancement

This work explores how self-supervised learning can be universally used ...
research
07/12/2022

Label-Efficient Self-Supervised Speaker Verification With Information Maximization and Contrastive Learning

State-of-the-art speaker verification systems are inherently dependent o...
research
09/08/2021

Self- and Pseudo-self-supervised Prediction of Speaker and Key-utterance for Multi-party Dialogue Reading Comprehension

Multi-party dialogue machine reading comprehension (MRC) brings tremendo...
research
07/06/2023

Encoder-Decoder Networks for Self-Supervised Pretraining and Downstream Signal Bandwidth Regression on Digital Antenna Arrays

This work presents the first applications of self-supervised learning ap...

Please sign up or login with your details

Forgot password? Click here to reset