Self-supervised Pre-training Reduces Label Permutation Instability of Speech Separation

by   Sung-Feng Huang, et al.

Speech separation has been well-developed while there are still problems waiting to be solved. The main problem we focus on in this paper is the frequent label permutation switching of permutation invariant training (PIT). For N-speaker separation, there would be N! possible label permutations. How to stably select correct label permutations is a long-standing problem. In this paper, we utilize self-supervised pre-training to stabilize the label permutations. Among several types of self-supervised tasks, speech enhancement based pre-training tasks show significant effectiveness in our experiments. When using off-the-shelf pre-trained models, training duration could be shortened to one-third to two-thirds. Furthermore, even taking pre-training time into account, the entire training process could still be shorter without a performance drop when using a larger batch size.



page 1

page 2

page 3

page 4


Comparison of Self-Supervised Speech Pre-Training Methods on Flemish Dutch

Recent research in speech processing exhibits a growing interest in unsu...

Self-Supervised Learning based Monaural Speech Enhancement with Multi-Task Pre-Training

In self-supervised learning, it is challenging to reduce the gap between...

SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study

Self-supervised pre-training methods have brought remarkable breakthroug...

Self-Supervised Beat Tracking in Musical Signals with Polyphonic Contrastive Learning

Annotating musical beats is a very long in tedious process. In order to ...

Interrupted and cascaded permutation invariant training for speech separation

Permutation Invariant Training (PIT) has long been a stepping stone meth...

DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT

Self-supervised speech representation learning methods like wav2vec 2.0 ...

Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT

This paper investigates self-supervised pre-training for audio-visual sp...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.