Neural Vocoder is All You Need for Speech Super-resolution

03/28/2022
by   Haohe Liu, et al.
8

Speech super-resolution (SR) is a task to increase speech sampling rate by generating high-frequency components. Existing speech SR methods are trained in constrained experimental settings, such as a fixed upsampling ratio. These strong constraints can potentially lead to poor generalization ability in mismatched real-world cases. In this paper, we propose a neural vocoder based speech super-resolution method (NVSR) that can handle a variety of input resolution and upsampling ratios. NVSR consists of a mel-bandwidth extension module, a neural vocoder module, and a post-processing module. Our proposed system achieves state-of-the-art results on the VCTK multi-speaker benchmark. On 44.1 kHz target resolution, NVSR outperforms WSRGlow and Nu-wave by 8 37 perceptual quality. We also demonstrate that prior knowledge in the pre-trained vocoder is crucial for speech SR by performing mel-bandwidth extension with a simple replication-padding method. Samples can be found in https://haoheliu.github.io/nvsr.

READ FULL TEXT
research
09/13/2023

AudioSR: Versatile Audio Super-resolution at Scale

Audio super-resolution is a fundamental task that predicts high-frequenc...
research
10/27/2022

Conditioning and Sampling in Variational Diffusion Models for Speech Super-resolution

Recently, diffusion models (DMs) have been increasingly used in audio pr...
research
07/18/2022

Geometry-Aware Reference Synthesis for Multi-View Image Super-Resolution

Recent multi-view multimedia applications struggle between high-resoluti...
research
07/31/2022

Robust Real-World Image Super-Resolution against Adversarial Attacks

Recently deep neural networks (DNNs) have achieved significant success i...
research
05/18/2023

mdctGAN: Taming transformer-based GAN for speech super-resolution with Modified DCT spectra

Speech super-resolution (SSR) aims to recover a high resolution (HR) spe...
research
05/11/2023

Exploiting Diffusion Prior for Real-World Image Super-Resolution

We present a novel approach to leverage prior knowledge encapsulated in ...
research
08/12/2023

BigWavGAN: A Wave-To-Wave Generative Adversarial Network for Music Super-Resolution

Generally, Deep Neural Networks (DNNs) are expected to have high perform...

Please sign up or login with your details

Forgot password? Click here to reset