DeepAI AI Chat
Log In Sign Up

Parametric Resynthesis with neural vocoders

by   Soumi Maiti, et al.
CUNY Law School

Noise suppression systems generally produce output speech with copromised quality. We propose to utilize the high quality speech generation capability of neural vocoders for noise suppression. We use a neural network to predict clean mel-spectrogram features from noisy speech and then compare two neural vocoders, WaveNet and WaveGlow, for synthesizing clean speech from the predicted mel spectrogram. Both WaveNet and WaveGlow achieve better subjective and objective quality scores than the source separation model Chimera++. Further, WaveNet and WaveGlow also achieve significantly better subjective quality ratings than the oracle Wiener mask. Moreover, we observe that between WaveNet and WaveGlow, WaveNet achieves the best subjective quality scores, although at the cost of much slower waveform generation.


page 1

page 2

page 3

page 4


Speech denoising by parametric resynthesis

This work proposes the use of clean speech vocoder parameters as the tar...

DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors

Human subjective evaluation is the gold standard to evaluate speech qual...

Enhancing into the codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders

Audio codecs based on discretized neural autoencoders have recently been...

Wideband Audio Waveform Evaluation Networks: Efficient, Accurate Estimation of Speech Qualities

Wideband Audio Waveform Evaluation Networks (WAWEnets) are convolutional...

Individually amplified text-to-speech

Text-to-speech (TTS) offers the opportunity to compensate for a hearing ...

Handling Background Noise in Neural Speech Generation

Recent advances in neural-network based generative modeling of speech ha...

A Pyramid Recurrent Network for Predicting Crowdsourced Speech-Quality Ratings of Real-World Signals

The real-world capabilities of objective speech quality measures are lim...