Self-Supervised Learning for Speech Enhancement through Synthesis

11/04/2022
by   Bryce Irvin, et al.
0

Modern speech enhancement (SE) networks typically implement noise suppression through time-frequency masking, latent representation masking, or discriminative signal prediction. In contrast, some recent works explore SE via generative speech synthesis, where the system's output is synthesized by a neural vocoder after an inherently lossy feature-denoising step. In this paper, we propose a denoising vocoder (DeVo) approach, where a vocoder accepts noisy representations and learns to directly synthesize clean speech. We leverage rich representations from self-supervised learning (SSL) speech models to discover relevant features. We conduct a candidate search across 15 potential SSL front-ends and subsequently train our vocoder adversarially with the best SSL configuration. Additionally, we demonstrate a causal version capable of running on streaming audio with 10ms latency and minimal performance degradation. Finally, we conduct both objective evaluations and subjective listening studies to show our system improves objective metrics and outperforms an existing state-of-the-art SE model subjectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2020

Self-supervised Learning for Speech Enhancement

Supervised learning for single-channel speech enhancement requires caref...
research
05/24/2023

Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss

Self-supervised learning (SSL) is the latest breakthrough in speech proc...
research
01/11/2023

Perceive and predict: self-supervised speech representation based loss functions for speech enhancement

Recent work in the domain of speech enhancement has explored the use of ...
research
07/27/2023

The Effect of Spoken Language on Speech Enhancement using Self-Supervised Speech Representation Loss Functions

Recent work in the field of speech enhancement (SE) has involved the use...
research
07/11/2023

On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis

Self-supervised learning (SSL) speech representations learned from large...
research
06/10/2022

Feature Learning and Ensemble Pre-Tasks Based Self-Supervised Speech Denoising and Dereverberation

Self-supervised learning (SSL) achieves great success in monaural speech...
research
02/16/2023

Speech Enhancement with Multi-granularity Vector Quantization

With advances in deep learning, neural network based speech enhancement ...

Please sign up or login with your details

Forgot password? Click here to reset