Speech Enhancement Using Self-Supervised Pre-Trained Model and Vector Quantization

09/28/2022
by   Xiao-Ying Zhao, et al.
0

With the development of deep learning, neural network-based speech enhancement (SE) models have shown excellent performance. Meanwhile, it was shown that the development of self-supervised pre-trained models can be applied to various downstream tasks. In this paper, we will consider the application of the pre-trained model to the real-time SE problem. Specifically, the encoder and bottleneck layer of the DEMUCS model are initialized using the self-supervised pretrained WavLM model, the convolution in the encoder is replaced by causal convolution, and the transformer encoder in the bottleneck layer is based on causal attention mask. In addition, as discretizing the noisy speech representations is more beneficial for denoising, we utilize a quantization module to discretize the representation output from the bottleneck layer, which is then fed into the decoder to reconstruct the clean speech waveform. Experimental results on the Valentini dataset and an internal dataset show that the pre-trained model based initialization can improve the SE performance and the discretization operation suppresses the noise component in the representations to some extent, which can further improve the performance.

READ FULL TEXT
research
02/16/2023

Speech Enhancement with Multi-granularity Vector Quantization

With advances in deep learning, neural network based speech enhancement ...
research
08/28/2023

Rep2wav: Noise Robust text-to-speech Using self-supervised representations

Benefiting from the development of deep learning, text-to-speech (TTS) t...
research
05/26/2022

Joint Training of Speech Enhancement and Self-supervised Model for Noise-robust ASR

Speech enhancement (SE) is usually required as a front end to improve th...
research
06/14/2023

Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement

Large, pre-trained representation models trained using self-supervised l...
research
03/14/2023

Lightweight feature encoder for wake-up word detection based on self-supervised speech representation

Self-supervised learning method that provides generalized speech represe...
research
10/25/2022

Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach

Recovering the masked speech frames is widely applied in speech represen...
research
06/22/2022

A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement

Speech enhancement has seen great improvement in recent years using end-...

Please sign up or login with your details

Forgot password? Click here to reset