Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

04/27/2022
by   Sanyuan Chen, et al.
0

Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition. In this paper, we study which factor leads to the success of self-supervised learning on speaker-related tasks, e.g. speaker verification (SV), through a series of carefully designed experiments. Our empirical results on the Voxceleb-1 dataset suggest that the benefit of SSL to SV task is from a combination of mask speech prediction loss, data scale, and model size, while the SSL quantizer has a minor impact. We further employ the integrated gradients attribution method and loss landscape visualization to understand the effectiveness of self-supervised learning for speaker recognition performance.

READ FULL TEXT
research
05/18/2023

Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering

Self-supervised speech representation models have succeeded in various t...
research
06/16/2023

Evaluation of Speech Representations for MOS prediction

In this paper, we evaluate feature extraction models for predicting spee...
research
12/11/2020

Exploring wav2vec 2.0 on speaker verification and language identification

Wav2vec 2.0 is a recently proposed self-supervised framework for speech ...
research
11/01/2022

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

Self-supervised learning (SSL) methods which learn representations of da...
research
02/03/2022

Self-supervised Learning with Random-projection Quantizer for Speech Recognition

We present a simple and effective self-supervised learning approach for ...
research
10/26/2022

Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0

Self-supervised learning approaches have lately achieved great success o...
research
10/02/2022

What shapes the loss landscape of self-supervised learning?

Prevention of complete and dimensional collapse of representations has r...

Please sign up or login with your details

Forgot password? Click here to reset