DeepAI AI Chat
Log In Sign Up

Conditioning and Sampling in Variational Diffusion Models for Speech Super-resolution

by   Chin-Yun Yu, et al.

Recently, diffusion models (DMs) have been increasingly used in audio processing tasks, including speech super-resolution (SR), which aims to restore high-frequency content given low-resolution speech utterances. This is commonly achieved by conditioning the network of noise predictor with low-resolution audio. In this paper, we propose a novel sampling algorithm that communicates the information of the low-resolution audio via the reverse sampling process of DMs. The proposed method can be a drop-in replacement for the vanilla sampling process and can significantly improve the performance of the existing works. Moreover, by coupling the proposed sampling method with an unconditional DM, i.e., a DM with no auxiliary inputs to its noise predictor, we can generalize it to a wide range of SR setups. We also attain state-of-the-art results on the VCTK Multi-Speaker benchmark with this novel formulation.


page 1

page 2

page 3

page 4


AudioSR: Versatile Audio Super-resolution at Scale

Audio super-resolution is a fundamental task that predicts high-frequenc...

Neural Vocoder is All You Need for Speech Super-resolution

Speech super-resolution (SR) is a task to increase speech sampling rate ...

ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting

Diffusion-based image super-resolution (SR) methods are mainly limited b...

Implicit Diffusion Models for Continuous Super-Resolution

Image super-resolution (SR) has attracted increasing attention due to it...

Audio Super Resolution using Neural Networks

We introduce a new audio processing technique that increases the samplin...

Enabling Real-time On-chip Audio Super Resolution for Bone Conduction Microphones

Voice communication using the air conduction microphone in noisy environ...

Decimated Prony's Method for Stable Super-resolution

We study recovery of amplitudes and nodes of a finite impulse train from...

Code Repositories