Self-supervised Pre-training Reduces Label Permutation Instability of Speech Separation

10/29/2020
by   Sung-Feng Huang, et al.
0

Speech separation has been well-developed while there are still problems waiting to be solved. The main problem we focus on in this paper is the frequent label permutation switching of permutation invariant training (PIT). For N-speaker separation, there would be N! possible label permutations. How to stably select correct label permutations is a long-standing problem. In this paper, we utilize self-supervised pre-training to stabilize the label permutations. Among several types of self-supervised tasks, speech enhancement based pre-training tasks show significant effectiveness in our experiments. When using off-the-shelf pre-trained models, training duration could be shortened to one-third to two-thirds. Furthermore, even taking pre-training time into account, the entire training process could still be shorter without a performance drop when using a larger batch size.

READ FULL TEXT

Authors

page 1

page 2

page 3

page 4

09/29/2021

Comparison of Self-Supervised Speech Pre-Training Methods on Flemish Dutch

Recent research in speech processing exhibits a growing interest in unsu...
12/21/2021

Self-Supervised Learning based Monaural Speech Enhancement with Multi-Task Pre-Training

In self-supervised learning, it is challenging to reduce the gap between...
04/14/2022

SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study

Self-supervised pre-training methods have brought remarkable breakthroug...
01/05/2022

Self-Supervised Beat Tracking in Musical Signals with Polyphonic Contrastive Learning

Annotating musical beats is a very long in tedious process. In order to ...
10/28/2019

Interrupted and cascaded permutation invariant training for speech separation

Permutation Invariant Training (PIT) has long been a stepping stone meth...
10/05/2021

DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT

Self-supervised speech representation learning methods like wav2vec 2.0 ...
05/15/2022

Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT

This paper investigates self-supervised pre-training for audio-visual sp...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.