NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates

06/17/2022
by   Seungu Han, et al.
0

Conventionally, audio super-resolution models fixed the initial and the target sampling rates, which necessitate the model to be trained for each pair of sampling rates. We introduce NU-Wave 2, a diffusion model for neural audio upsampling that enables the generation of 48 kHz audio signals from inputs of various sampling rates with a single model. Based on the architecture of NU-Wave, NU-Wave 2 uses short-time Fourier convolution (STFC) to generate harmonics to resolve the main failure modes of NU-Wave, and incorporates bandwidth spectral feature transform (BSFT) to condition the bandwidths of inputs in the frequency domain. We experimentally demonstrate that NU-Wave 2 produces high-resolution audio regardless of the sampling rate of input while requiring fewer parameters than other models. The official code and the audio samples are available at https://mindslab-ai.github.io/nuwave2.

READ FULL TEXT
research
04/06/2021

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling

In this work, we introduce NU-Wave, the first neural audio upsampling mo...
research
05/06/2021

Point Cloud Audio Processing

Most audio processing pipelines involve transformations that act on fixe...
research
10/28/2022

Nonparallel High-Quality Audio Super Resolution with Domain Adaptation and Resampling CycleGANs

Neural audio super-resolution models are typically trained on low- and h...
research
01/29/2022

ItôWave: Itô Stochastic Differential Equation Is All You Need For Wave Generation

In this paper, we propose a vocoder based on a pair of forward and rever...
research
11/22/2022

AERO: Audio Super Resolution in the Spectral Domain

We present AERO, a audio super-resolution model that processes speech an...
research
05/17/2021

ItôTTS and ItôWave: Linear Stochastic Differential Equation Is All You Need For Audio Generation

In this paper, we propose to unify the two aspects of voice synthesis, n...
research
09/25/2022

Multimodal Exponentially Modified Gaussian Oscillators

Acoustic modeling serves audio processing tasks such as de-noising, data...

Please sign up or login with your details

Forgot password? Click here to reset