DeepAI AI Chat
Log In Sign Up

Momentum Contrast Speaker Representation Learning

by   Jangho Lee, et al.

Unsupervised representation learning has shown remarkable achievement by reducing the performance gap with supervised feature learning, especially in the image domain. In this study, to extend the technique of unsupervised learning to the speech domain, we propose the Momentum Contrast for VoxCeleb (MoCoVox) as a form of learning mechanism. We pre-trained the MoCoVox on the VoxCeleb1 by implementing instance discrimination. Applying MoCoVox for speaker verification revealed that it outperforms the state-of-the-art metric learning-based approach by a large margin. We also empirically demonstrate the features of contrastive learning in the speech domain by analyzing the distribution of learned representations. Furthermore, we explored which pretext task is adequate for speaker verification. We expect that learning speaker representation without human supervision helps to address the open-set speaker recognition.


page 1

page 2

page 3

page 4


Unsupervised Representation Learning for Speaker Recognition via Contrastive Equilibrium Learning

In this paper, we propose a simple but powerful unsupervised learning me...

Learning Speaker Embedding with Momentum Contrast

Speaker verification can be formulated as a representation learning task...

Self-supervised Text-independent Speaker Verification using Prototypical Momentum Contrastive Learning

In this study, we investigate self-supervised representation learning fo...

Contrastive Predictive Coding Based Feature for Automatic Speaker Verification

This thesis describes our ongoing work on Contrastive Predictive Coding ...

Improved Baselines with Momentum Contrastive Learning

Contrastive unsupervised learning has recently shown encouraging progres...

Designing an Effective Metric Learning Pipeline for Speaker Diarization

State-of-the-art speaker diarization systems utilize knowledge from exte...

Learning Decoupling Features Through Orthogonality Regularization

Keyword spotting (KWS) and speaker verification (SV) are two important t...