Momentum Contrast Speaker Representation Learning

10/22/2020
by   Jangho Lee, et al.
0

Unsupervised representation learning has shown remarkable achievement by reducing the performance gap with supervised feature learning, especially in the image domain. In this study, to extend the technique of unsupervised learning to the speech domain, we propose the Momentum Contrast for VoxCeleb (MoCoVox) as a form of learning mechanism. We pre-trained the MoCoVox on the VoxCeleb1 by implementing instance discrimination. Applying MoCoVox for speaker verification revealed that it outperforms the state-of-the-art metric learning-based approach by a large margin. We also empirically demonstrate the features of contrastive learning in the speech domain by analyzing the distribution of learned representations. Furthermore, we explored which pretext task is adequate for speaker verification. We expect that learning speaker representation without human supervision helps to address the open-set speaker recognition.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2020

Unsupervised Representation Learning for Speaker Recognition via Contrastive Equilibrium Learning

In this paper, we propose a simple but powerful unsupervised learning me...
research
01/07/2020

Learning Speaker Embedding with Momentum Contrast

Speaker verification can be formulated as a representation learning task...
research
12/13/2020

Self-supervised Text-independent Speaker Verification using Prototypical Momentum Contrastive Learning

In this study, we investigate self-supervised representation learning fo...
research
04/01/2019

Contrastive Predictive Coding Based Feature for Automatic Speaker Verification

This thesis describes our ongoing work on Contrastive Predictive Coding ...
research
03/09/2020

Improved Baselines with Momentum Contrastive Learning

Contrastive unsupervised learning has recently shown encouraging progres...
research
11/01/2018

Designing an Effective Metric Learning Pipeline for Speaker Diarization

State-of-the-art speaker diarization systems utilize knowledge from exte...
research
03/31/2022

Learning Decoupling Features Through Orthogonality Regularization

Keyword spotting (KWS) and speaker verification (SV) are two important t...

Please sign up or login with your details

Forgot password? Click here to reset