BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis

05/30/2022
by   Yichong Leng, et al.
1

Binaural audio plays a significant role in constructing immersive augmented and virtual realities. As it is expensive to record binaural audio from the real world, synthesizing them from mono audio has attracted increasing attention. This synthesis process involves not only the basic physical warping of the mono audio, but also room reverberations and head/ear related filtrations, which, however, are difficult to accurately simulate in traditional digital signal processing. In this paper, we formulate the synthesis process from a different perspective by decomposing the binaural audio into a common part that shared by the left and right channels as well as a specific part that differs in each channel. Accordingly, we propose BinauralGrad, a novel two-stage framework equipped with diffusion models to synthesize them respectively. Specifically, in the first stage, the common information of the binaural audio is generated with a single-channel diffusion model conditioned on the mono audio, based on which the binaural audio is generated by a two-channel diffusion model in the second stage. Combining this novel perspective of two-stage synthesis with advanced generative models (i.e., the diffusion models),the proposed BinauralGrad is able to generate accurate and high-fidelity binaural audio samples. Experiment results show that on a benchmark dataset, BinauralGrad outperforms the existing baselines by a large margin in terms of both object and subject evaluation metrics (Wave L2: 0.128 vs. 0.157, MOS: 3.80 vs. 3.61). The generated audio samples are available online.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2023

HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models

Recently, denoising diffusion models have demonstrated remarkable perfor...
research
08/02/2023

From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion

Deep generative models can generate high-fidelity audio conditioned on v...
research
08/10/2023

Generative Diffusion Models for Radio Wireless Channel Modelling and Sampling

Channel modelling is essential to designing modern wireless communicatio...
research
02/08/2023

Noise2Music: Text-conditioned Music Generation with Diffusion Models

We introduce Noise2Music, where a series of diffusion models is trained ...
research
09/13/2023

Diffusion models for audio semantic communication

Directly sending audio signals from a transmitter to a receiver across a...
research
04/06/2021

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling

In this work, we introduce NU-Wave, the first neural audio upsampling mo...
research
10/26/2022

Full-band General Audio Synthesis with Score-based Diffusion

Recent works have shown the capability of deep generative models to tack...

Please sign up or login with your details

Forgot password? Click here to reset