Speaker independence of neural vocoders and their effect on parametric resynthesis speech enhancement

11/14/2019
by   Soumi Maiti, et al.
0

Traditional speech enhancement systems produce speech with compromised quality. Here we propose to use the high quality speech generation capability of neural vocoders for better quality speech enhancement. We term this parametric resynthesis (PR). In previous work, we showed that PR systems generate high quality speech for a single speaker using two neural vocoders, WaveNet and WaveGlow. Both these vocoders are traditionally speaker dependent. Here we first show that when trained on data from enough speakers, these vocoders can generate speech from unseen speakers, both male and female, with similar quality as seen speakers in training. Next using these two vocoders and a new vocoder LPCNet, we evaluate the noise reduction quality of PR on unseen speakers and show that objective signal and overall quality is higher than the state-of-the-art speech enhancement systems Wave-U-Net, Wavenet-denoise, and SEGAN. Moreover, in subjective quality, multiple-speaker PR out-performs the oracle Wiener mask.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2019

Speech denoising by parametric resynthesis

This work proposes the use of clean speech vocoder parameters as the tar...
research
04/01/2019

Training Multi-Speaker Neural Text-to-Speech Systems using Speaker-Imbalanced Speech Corpora

When the available data of a target speaker is insufficient to train a h...
research
06/13/2020

SE-MelGAN – Speaker Agnostic Rapid Speech Enhancement

Recent advancement in Generative Adversarial Networks in speech synthesi...
research
06/02/2020

Dilated U-net based approach for multichannel speech enhancement from First-Order Ambisonics recordings

We present a CNN architecture for speech enhancement from multichannel f...
research
07/22/2020

Resource-Efficient Speech Mask Estimation for Multi-Channel Speech Enhancement

While machine learning techniques are traditionally resource intensive, ...
research
06/20/2020

Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams

Generating 3D speech-driven talking head has received more and more atte...

Please sign up or login with your details

Forgot password? Click here to reset