Self-Supervised Learning based Monaural Speech Enhancement with Multi-Task Pre-Training

12/21/2021
by   Yi Li, et al.
0

In self-supervised learning, it is challenging to reduce the gap between the enhancement performance on the estimated and target speech signals with existed pre-tasks. In this paper, we propose a multi-task pre-training method to improve the speech enhancement performance with self-supervised learning. Within the pre-training autoencoder (PAE), only a limited set of clean speech signals are required to learn their latent representations. Meanwhile, to solve the limitation of single pre-task, the proposed masking module exploits the dereverberated mask and estimated ratio mask to denoise the mixture as the second pre-task. Different from the PAE, where the target speech signals are estimated, the downstream task autoencoder (DAE) utilizes a large number of unlabeled and unseen reverberant mixtures to generate the estimated mixtures. The trained DAE is shared by the learned representations and masks. Experimental results on a benchmark dataset demonstrate that the proposed method outperforms the state-of-the-art approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/21/2021

Self-Supervised Learning based Monaural Speech Enhancement with Complex-Cycle-Consistent

Recently, self-supervised learning (SSL) techniques have been introduced...
research
06/10/2022

Feature Learning and Ensemble Pre-Tasks Based Self-Supervised Speech Denoising and Dereverberation

Self-supervised learning (SSL) achieves great success in monaural speech...
research
06/18/2020

Self-supervised Learning for Speech Enhancement

Supervised learning for single-channel speech enhancement requires caref...
research
10/29/2020

Self-supervised Pre-training Reduces Label Permutation Instability of Speech Separation

Speech separation has been well-developed while there are still problems...
research
04/18/2022

The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training

The self-supervised Masked Image Modeling (MIM) schema, following "mask-...
research
02/17/2022

RemixIT: Continual self-training of speech enhancement models via bootstrapped remixing

We present RemixIT, a simple yet effective self-supervised method for tr...

Please sign up or login with your details

Forgot password? Click here to reset