From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion

08/02/2023
by   Robin San-Roman, et al.
0

Deep generative models can generate high-fidelity audio conditioned on various types of representations (e.g., mel-spectrograms, Mel-frequency Cepstral Coefficients (MFCC)). Recently, such models have been used to synthesize audio waveforms conditioned on highly compressed representations. Although such methods produce impressive results, they are prone to generate audible artifacts when the conditioning is flawed or imperfect. An alternative modeling approach is to use diffusion models. However, these have mainly been used as speech vocoders (i.e., conditioned on mel-spectrograms) or generating relatively low sampling rate signals. In this work, we propose a high-fidelity multi-band diffusion-based framework that generates any type of audio modality (e.g., speech, music, environmental sounds) from low-bitrate discrete representations. At equal bit rate, the proposed approach outperforms state-of-the-art generative techniques in terms of perceptual quality. Training and, evaluation code, along with audio samples, are available on the facebookresearch/audiocraft Github page.

READ FULL TEXT

page 5

page 15

page 16

research
02/08/2023

Noise2Music: Text-conditioned Music Generation with Diffusion Models

We introduce Noise2Music, where a series of diffusion models is trained ...
research
06/12/2023

HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models

Recently, denoising diffusion models have demonstrated remarkable perfor...
research
10/26/2022

Full-band General Audio Synthesis with Score-based Diffusion

Recent works have shown the capability of deep generative models to tack...
research
10/24/2022

High Fidelity Neural Audio Compression

We introduce a state-of-the-art real-time, high-fidelity, audio codec le...
research
05/30/2022

BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis

Binaural audio plays a significant role in constructing immersive augmen...
research
07/05/2019

Speech bandwidth extension with WaveNet

Large-scale mobile communication systems tend to contain legacy transmis...
research
05/10/2023

Diffusion-based Signal Refiner for Speech Separation

We have developed a diffusion-based speech refiner that improves the ref...

Please sign up or login with your details

Forgot password? Click here to reset